

$1. SttpiMm't CoHiit ybnity# 

am.»,.£/Z.9/f “Mih. jS.774, 

Aa.tt. ’Syc^Z am /3- S-7(, 


"Thit book hi duo back in tilt Ubiwy on tiw tiprto 
(Mtotempod. A tint of !• PaiM wtti bo charonti for wMh 
day tin book it ovtr • thnt.*' 


W. B. Saunders Uompany: west wa*snin«toii oqumt; 

Philadelphia, Pa. 19105 


12 Dyott Street 
London WCl A IDB 

1835 Yonjie Street 
T(jronto 7, Ontario 


Linear F’unetional Analysis; An Introduction to Lchcsmie 1 ntc^^ratif)n and Infinite 
Dimensional Problems SBN 0-7216-3395-1 


(f) 1970 by W. B. Saunders Company. Copyright under the International Copyright 
Union. All riglits reserved. This book is protected by copyright. No part of it 
may be reproduced, stored in a retrieval system, or transmitted in any form or by any 
means, electronic, mechanical photocopying, recording, or otherwise, without 
written pernvission from the publisher. Made in the United States of America. 
Press of W. B. Saunders Company. Library of Congress Catalog card number 
77-92130. 


Print No.; 987654 3 2 



PREFACE 


In recent decades the expression '1'iinclional analysis" has come to 
cover a rapidly expanding and increasingly dilTiise area of malhenialics. 
One may adopt, as a rough working definition, the following: l imctional 
analysis is that discipline which has arisen out of the attempt to extract 
from diverse problems of analysis their common underlying features and 
to develop an abstract theory which is applicable to these diverse problems 
as particular cases. 

It has often happened in the history of mathematics that an abstract 
theory which has "spun ofT" from more concrete subject-matter is pursued 
for its own sake and that its practitioners forget (or never learn) the 
"practical" sources of the subject in whicl) they arc so deeply interested. 
(To cite only one example, it would take very little effort to discover, in 
many graduate departments of mathematics, a substantial number of 
students [and also some faculty members] who can quote and prove with 
great precision the Fredholm Alternative without realizing that it is 
simply an abstract version of the fundamental result in the theory of 
integral equations.) This is doubly unfortunate; on the one hand, a 
student who takes a highly ab.stract course may fail to realize how he 
might effectively apply the theory to a specific problem on which he is 
working, and, on the other hand, the research mathematician who works 
on the abstract theory without knowing its origins may be depriving 
himself of a rich source of interesting and important topics of investigation. 


Hi 



This short book is written with the purpose of introducing the reader 
to a very limited, and comparatively elementary, portion of the vast field 
of functional analysis in such a manner that he will see that the develop- 
ment of the subject-matter was stimulated by significant problems of 
concrete, or “hard,” analysis, and that, conversely, the abstract theory 
may often be effectively employed in the study of specific concrete 
problems. It is my firm belief that such an introduction to functional 
analysis should precede a more ambitious and more abstract course in 
the subject, which is now required as part of the program of studies 
leading to the doctorate in many (probably most) departments of mathe- 
matics. Furthermore, it is my hope that this book may prove helpful to 
students and practitioners of disciplines other than mathematics who 
must make effective use of a somewhat limited mathematical training. 
It has become commonplace to say that when mathematicians and users 
of mathematics separate themselves from each other the result is that both 
suffer, but the fact that this remark is commonplace detracts neither from 
its correctness nor its pertinence. 1 shall be pleased if this book helps to 
some extent in encouraging mutually advantageous contact between the 
makers and the users of mathematics. 

It remains to express appreciation to those persons who have helped 
in various ways to bring this book into being: A number of students who 
urged me to expand a set of lecture notes into book form; my wife, 
Florence, and son, David, who encouraged me and who converted 
scrawled sheets into typed manuscript; Dr. Fred Greenleaf of New York 
University, who enthusiastically and unselfishly devoted much time and 
energy to reading the manuscript, suggesting additional topics, detecting 
errors, and making many valuable suggestions for improving the ex- 
position; Mr. Peter Renz of Reed College, who also read the manuscript 
and made a number of valuable suggestions; and Mr. George Fleming, 
the mathematics editor of W. B. Saunders Co., who, along with his 
capable assistants, saw the book through the many stages from hand- 
written notes to finished product. 


Albuquerque, New Mexico 


Bernard Epstein 



NOTATION 


§6 -3 denotes the third section of Chapter 6. 

Theorem 6 3 denotes the third theorem of the sixth section of the 
current chapter; when reference is made to a theorem in another chapter, 
we write, for example, ‘Theorem 6 -3 of Chapter 4/’ Similar notation is 
used for definitions, exercises, and equations. 

Any item indicated by a capital letter appears in the corresponding 
appendix; for example, equation (//-2) is the second equation of Appendix 
//. 



3. Measure of More General Sets 36 

4. Measurable Functions 44 

5. Integration of Non-negative Functions 49 

6. Integration of Real-valued and Complex-valued Functions 54 

7. Integration over Plane Sets 57 

8. Concluding Remarks 60 

Chapter 3 

THE IP- AND P-SPACES 62 

1. Basic Concepts 62 

2. The Holder and Minkowski Inequalities 64 

3. Definition of a Metric in 66 

4. Completeness of 67 

5. The Space L“ 69 

6. The Spaces and P 70 

7. Separability of and P 73 

Chapter 4 

NORMED LINEAR SPACES 76 

1. Linear Spaces 76 

2. Normed Linear Spaces 80 

3. Inner-product Spaces 87 

4. Hilbert Spaces 92 

5. Orthonormal Bases in Hilbert Spaces 97 

Chapter 5 

LINEAR FUNCTIONALS 102 

1. Basic Definitions and Concepts 102 

2. The Principle of Uniform Boundedness 104 

3. Bounded Linear Functionals in Hilbert Spaces 109 

4. The Dual Space of Lp(A) 116 

5. The Hahn-Banach Theorem 120 


Chapter 6 

OPERATORS 126 

1. Linear Transformations and Operators 126 

2. The Adjoint Operator 129 

3. The Inverse of an Operator 131 



4. Sequences ot Uperaiors 

5. Hermitian Operators 137 

6. Projections 143 

7. The Spectrum of an Operator 146 

8. Spectra of Hermitian, Normal, and Unitary Operators ... 149 

Chapter 7 

OPERATORS ON FINtTE-OIMENSIONAL SPACES 153 

1. Matrix Representation of Linear Transformations 153 

2. Eigenvalues and Eigenvectors 157 

3. Finite-dimensional Inner-product Spaces 162 

4. Kellogg’s Method of Estimating the Largest Eigenvalue . . 168 

5. Spectral Representation of Hermitian Operators 171 

6. Spectral Representation of Normal Operators 174 

Chapter 8 

ELEMENTS OF SPECTRAL THEORY IN INFINITE- 
DIMENSIONAL HILBERT SPACES 177 

1. Completely Continuous Operators 177 

2. Spectral Analysis of a Hermitian Completely Continuous 

Operator 181 

3. The Fredholm Alternative 185 

4. Survey of the Fredholm Theory of Integral Equations ... 189 

5. Estimation of the Largest Eigenvalue 193 

APPENDICES 

A. Partially Ordered Sets and Zorn’s Lemma 197 

B. Concerning the Spectrum of an Operator on a Com- 

plex Banach Space 200 

C. The Stieltjes Integral 201 

D. The Weierstrass Approximation Theorem and Ap- 

proximation BY Trigonometric Polynomials .... 203 

E. The Structure of Open Sets of Real Numbers .... 207 


F. Infinite Series and the Number System [0, +oo] ... 208 



X CONTENTS 


G. Limit Superior and Limit Inferior 211 

H. The Fourier Transform in L^(R) 214 

SOME SUGGESTIONS FOR FURTHER READINGS 221 

CITED REFERENCES 223 

INDEX 225 



CHAPTER I 


METRIC SPACES 


§1. SET-THEORETIC NOTATION 

We shall begin by dispensing with the customary survey of intuitive 
or naive set theory, for at this stage in his mathematical studies the reader 
has, almost certainly, already been exposed to an excessive number of 
such presentations. Instead, we shall content ourselves with a statement 
of most of the very small amount of set-theoretic notation which will be 
used. A few additional notations will be introduced as the need arises. 
Membership and non-membership in a set will be denoted by the 
symbols c and (p, respectively. Uac/ denote the union of the sets 

where a ranges over the index-set /; if the index-set is clearly under- 
stood from the context, the less explicit notation (Ja t)r even U 
will be used. If the index-set consists of the finite set of integers 

{1,2,3 n] 

or the set of all positive integers {1,2,3,...} we shall use such self- 
explanatory notation as U” , Ej u Eg ^ ^ ‘ ^ or 

Ej u u £*3 u • • • . For intersections we shall use analogous notations, 
with U and u replaced by Q and n, respectively. The symbol denotes 
proper set inclusion, while ^ denotes either proper or improper set 
inclusion. The empty set will be denoted 0. The difference between the 
sets A and B will be denoted A — B, (It is not required that B 9 
The complement of A will be denoted A*'; of course, A^' is well defined 
only when a “universal set” U is prescribed, either explicitly or implicitly. 


I 



4 METRIC SPACES 


Exercises 

1. Give a reasonable example of a situation in which one encounters 

a distance function which violates the condition (/) of Defini- 
tion 1. (Hint: Consider a number of cities connected by a 
network of roads.) 

2. Let A/ be a metric space and let jV be a non-empty subset of A/, 

with distance between elements of N defined as in M. Show 
that N is then also a metric space. Whenever we refer to a 
non-empty subset of a metric space as a metric space in its 
own right, it is understood that the metric is defined in this 
way. (That is, the metric in N is simply the restriction of the 
original metric to X N,) 

3. Consider the metric spaces described in Examples (/) and (/?); 

show that a pair of functions,/ and g, may be close to each 
other in one metric and far apart in the other. 

4. Let Qi, ^21 • • • » ^^tid />!, . , . , be real numbers, at least 

one of the h\ being dilTerent from zero. Show that the 
quadratic equation 


n 


liOk - M)‘ = 0 

k~l 


cannot have two distinct real roots. Deduce from this fact 
the Cauchy-Schwarz inequality 


U-i ) j 

(Of course, if the /Ts all vanish there is nothing to prove.) 

5. Use the Cauchy-Schwarz inequality to establish (for arbitrary real 
numbers aj, a 2 » • ♦ • ? ^ 7 i ^ 2 ' • • • > ^n) the inequality 



From this result demonstrate, in turn, the inequality (valid 
for arbitrary real numbers , , . , b^, . . . , b^. 

Cl, c„) 


i < 2 A - V I + 


1/2 


(Note that for == 3 this demonstrates the triangle inequality 
for the space defined in Example (/).) 



3. ELEMENTARY TOPOLOGY OF METRIC SPACES S 


6. For arbitrar)' complex numbers prove the inequalities 


tl 


k - 1 



U-l I 

( n \\f2 


/ H \l/2 / n vl '2 

iKii + , 

(vu ) It- 1 ) 


2 l«*- 

Uu. 1 




( 




1/2 


§3. ELEMENTARY TOPOLOGY OF METRIC SPACES 

In this and ihe next section we shall encounter certain concepts which 
are obvious extensions of those arising in the study of the fundamental 
principles of real analysis, and so our discussions will justifiably be quite 
concise. 

Given a metric space A/, a subset S of A/, and an element .\ of M 
(not necessarily contained in 5), we say that .v is a limit-point of S if for 
every positive number € there exists at least one clement y belonging to S 
ami distinct from x such that p(a, y) < c. Alternatively (cf. Exercise 1), 
.V is a limit-point of S if for every positive number e there exist infinitely 
many members of S whose distance from v is less than c. (In particular 
if S consists of only a finite number of points, it has no limit-point.) The 
set S is said to be closed if it contains all its limit-points. (Of course, if S 
has no limit-points, it contains all its limit-points, and so it is closed; in 
particular, any finite subset of M is closed.) 

Having defined a limit-point, we now quote without proof the follow- 
ing theorem, which is one of the foundation stones of real analysis. If the 
reader is not thoroughly acquainted with this theorem and its proof he 
should study it carefully in a text dealing with the fundamentals of real 
analysis. 

Theorem I (Bolzano-Weierstrass) : Let S he an infinite bounded* 
subset of the real number system R. Then S has (hut does not necessarily 
contain\) at least one limit-point. 

A pointy belonging to the subset S of A/ is termed an inner point of S 
if there exists some positive number b such that every point of M whose 
distance from y is less than A is a member of S. The set 5 is said to be 
open if all its points are inner points (of S). The relation between open 
and closed sets is the same as that which holds in R\ the set S is open iflft 
5^ is closed. (Cf. Exercise 2.) 

* A subset 5 of /? is said to be bounded there exist real numbers a and b such that 
the inequalities a x ^ b hold for every member x of S. Equivalently, .S' is said to be 
bounded if there exists a positive number c such that the inequality |.v — yl < c holds 
for all members x and y of S. 

t If and only if. 



6 MiTRIC SPACES 


For any positive number c and for any point x of A/, and SJ.y] 
denote the subsets of M consisting of all points j satisfying the inequalities 
p(x,y) < € and pix,y) < c, respectively. Such sets are called open halls 
and closed halls, respectively. (The justification of these terms is the 
subject of Exercise 9.) The point x is termed the center of the ball, and € 
is termed the radius. 

In a very reasonable sense, open balls are the building blocks from 
which all open sets are constructed. This statement is made precise in 
the following theorem, whose simple proof is left to the reader as Exercise 
10 . 


Theorem 2: A subset S of a metric space is open iff it can he 
expressed as the union of a collection {not necessarily finite or countably 
infinite) of open halls; equivalently, S is open iff for every element x in S 
there exists a positive number € such that 5^(a) S S. 

The interior of a subset 5, denoted int S, of M is defined as the set 
of all inner points of S; int S is always an open set, and S is open itf 
S = int S. (Cf. Exercise 3.) Similarly, the closure of S is defined as the 
union of S and the set of all its limit-points; it is denoted S. The set S is 
always closed, and S is closed iff S ^ S', hence, for any set S, S = S. 
(Cf. Exercise 4.) 

At this point we draw the reader's attention to Exercise 4-10, which 
ties up the concepts of limit-point and closure with that of a convergent 
sequence. 

We conclude this section with an important definition; The subset 
5 of A/ is called dense if 5 = M, 

Exercises 

1 . Prove the equivalence of the two given definitions of a limit-point. 

2. Prove the assertion that S is open iff S' is closed. 

3. Prove that 5 is open iff 5* = int S, and that, for any set S, 

int (int S) = int S. 

4. Prove that 5 is closed iff 5 = 5", and that, for any set S, S S. 

5. Prove that any metric space M (considered as a subset of itself) 

is both open and closed; also that the empty set 0 (considered 
as a subset of M) is both open and closed. 

6. Prove that 0 and R are the only subsets of R which are both 

open and closed. 

7. In contrast to the preceding exercise, prove that every subset of 

the metric space defined in Example (/>) of §2 is both open and 
closed. 



4. CONVERGENCE AND COMPLETENESS 7 


8 . Prove that the union of a finite collection of closed subsets and 

the intersection of any collection of closed subsets are closed. 
Then state and prove corresponding assertions concerning 
open subsets. 

9. Prove that open and closed balls are open and closed sets, 

respectively. 

10. Prove Theorem 2. 

1 1. Prove that 5’<(.v) ^ -%[x]. Show, by means of a simple example, 
that the inclusion may be proper. 

§4. CONVERGENCE AND COMPLETENESS 

The reader is certainly acquainted with the concept of a sequence of 
real numbers; a formal definition of such a sequence is that it is a function 
(mapping, transformation) from the set of positive integers into the set of 
real numbers, but informally one may think of a non-terminating succession 
of real numbers labeled with the positive integers. Similarly, we can 
define a sequence of objects which are members of any non-empty set. 
(It should be emphasized that repetitions are permitted; the tenth term 
may be the same as the fifth, and it may even happen that all terms of the 
sequence coincide.) We shall, of course, be interested in sequences whose 
terms are points of a given metric space M\ more briefly, sequences in A/. 

Such a sequence, say .Vg, X 3 , . . . , is said to converge to the point 
A' if for every positive number € there exists a positive integer N (depending 
usually on e) such that p( v, a*„) < e for all (not merely sortie) integers n 
exceeding N. The sequence is said to be convergent, or to converge, if 
there exists a point a to which the sequence converges. It is a very 
elementary but important fact that a sequence which converges to x 
cannot also converge to some other point y. For suppose that this could 
occur. Then for any positive number €, we could choose an integer Nx 
such that p(x, a„) < c whenever n > Nx, and similarly we could choose an 
integer such that p(y, .r„) < € whenever n > N^. Choosing an integer 
n exceeding both Nx and N^. we would then obtain, by the triangle in- 
equality, p(x,y) < p(a, a'„) -f p(y, xj < c -f € = 2 €, and so, because 
of the arbitrariness of €, pix,y) = 0, or a = y. Thus, we are justified in 
speaking of a as the limit, rather than a limit, of a convergent sequence. 
We often write > a, or lim^ ^ a„ = a. 

Given a sequence converging to some limit a, we can also assert 
that terms far out in the sequence are very close to each other; for, given 
any positive number €, we can choose an integer N such that whenever 
n > N, p(a, a„) < c/2. Choosing integers p and q exceeding jV, we obtain, 
again by the triangle inequality, p{x^, Xg) < /)(a,„ a) -f p{Xg, x) < c/2 -f- 
c/2 = c. 



8 METRIC SPACES 


One might suppose that the preceding reasoning is reversible; i.e., 
that if for every positive e there exists an integer A^such that xj < e 
whenever p and q both exceed N, then the sequence is convergent. This 
supposition is, however, false. To prove this, let us consider the 
metric space R and in this space form the convergent sequence 1.4, 1.41, 
1.414, . . . consisting of successive decimal approximations to V 2. If, how- 
ever, we consider the given sequence as a sequence in Q rather than in /?, 
it is not convergent, for there is no number in g to which the terms of 
the sequence come closer and closer. (Remember that v2 is irrational, 
or, in other words, that il does not exist if the universe is Q rather than 
R.) 

We are therefore led to define a sequence (in a metric space) to be a 
Cauchy sequence if for every positive number € there exists an integer N 
such that .Vy) < e whenever p and q both exceed N\ and then we 
define a metric space M to be complete if every Cauchy sequence in M is 
convergent. Referring back to the previous paragraph, we see that the 
sequence 1.4, 1.41, 1.414, ... is a Cauchy sequence, but not a convergent 
sequence, in g, and so the metric space g is not complete. The reader is 
presumably aware that /?, in contrast, is complete, and that this is the 
fundamental reason why R, rather than g, serves as the setting for real 
analysis. 


Exercises 

1. A point .V is said to be isolated if {.v} (i.e., the set consisting of the 

single point .v) is open. Prove that if .v is not isolated there 
exists a sequence, consisting of points distinct from a* and 
from each other, which converges to .v. 

2. Prove that a non-empty subset of a complete metric space is 

complete iff it is closed. 

3. Prove that every subsequence* of a Cauchy sequence is Cauchy 

and that every subsequence of a convergent sequence is 
convergent to the same limit. 

4. Suppose that a Cauchy sequence possesses a convergent sub- 

sequence. Prove that the original sequence is also convergent. 

5. Let M he a complete metric space and let Fj, • • • t>e non- 

empty nested subsets of M. (That is, • • • £ F 3 ^ Fg ^ Fj.) 


* Let ai,a8, . . . be any sequence and let /?2, 713, ... be a strictly increasing 
sequence of positive integers (i.e., 0 < <«;,<•• •)• Then the sequence 

• • • is a subsequence of the original sequence. Thus, a subsequence is 
obtained by removing some, perhaps infinitely many, of the terms of the given sequence, 
provided that an infinite number of terms remain and the order in which the terms 
appear is not altered. 



4. CONVEIIGENCE AND COHrLETENESS f 


Show that if each of the F s is closed and if the diameter* of F/t 
approaches zero with increasing then Plrl i empty- 

in fact, the intersection consists of exactly one point. (This is 
the Cantor intersection theorem.) 

6. Prove (by considering various subsets of R) that the assertion 

made in the preceding exercise fails if either the completeness 
condition, the closedness condition, or the condition on the 
diameters is omitted. 

7. Let A/ be any metric space and let two sequences in A/ (not 

necessarily convergent, or even Cauchy), say .Vj, .Vj, . . . 

andj>i, y 2 ^ Va be called equivalent if lim„ — 

0. First, show that this is indeed an equivalence relation.f 
Second, show that if one of two equivalent sequences is con- 
vergent, then the other is also convergent and the two se- 
quences have the same limit. 

8. (Continuation of Exercise 7). Let A be the class of all Cauchy 

sequences in A/, and let .y, and .yg be two members of A : 



Xu 

(2) 
'2 ^ 

.X3 


Show that lim,^ pC-'n ^ exists and is unaltered if and 
.Vg are replaced by equivalent sequences and respectively; 
also, show that this limit is zero iff and .yg are equivalent. 

Finally, show how these results can be used to define a 
complete metric space whose elements are the various equiva- 
lence classes of Cauchy sequences in A/ and such that the 
elements of A/, in a reasonable sense, constitute a dense 
subset. (This new space is termed the sequential completion 
M \ if M is complete, then it is essentially indistinguishable 
from its sequential completion.) 

9. Show that the diameter of a subset equals the diameter of its 
closure. 


* The diameter of a non-empty subset 5 of a metric space M is defined as sup^ 
p(x, v). The diameter may be infinite, of course, as in the case A/ = /?, A = 

t A relation defined on a non-empty set A is termed an equivalence relation if 
X X for every x in A, if x ^ y implies y ^ x, and if ;c and y ^ z implies x ^ z, 
(For the definition of a relation on a non-empty set, the reader may refer to the first 
two sentences of Appendix A.) Any equivalence relation in A determines a partition of 
A (i.e., a decomposition of A into disjoint subsets such that elements x and y belong to 
the same subset iff x '^y). These subsets are called equivalence classes of A (with 
respect to the given equivalence relation). 



iO METRIC SPACES 


10. (a) Let S be a subset of a metric space M. Show that the point 

A' is a limit-point of S iff there exists a sequence in S' — {a} 
which converges to a. 

(h) Show that S is closed iff every point in M which is the limit of a 
sequence in S belongs to S. 

(c) Show that S is the set of all points in M which are limits of 
sequences in S. 

1 1. Show that the Bolzano-Weierstrass theorem can be reformulated 

as follows: Given any bounded infinite subset S of R, it is 
possible to extract from S a convergent sequence Vj, A2, A'3, . . . 
whose terms are all distinct. (Of course, the limit of this 
sequence need not belong to S.) 

12. Prove that the diameter of either an open ball ^S^Ca) or a closed 

ball SJa] cannot exceed 2c. Show by means of a simple 
example that the diameter may be less than 2c. Can the diam- 
eter of a ball equal zero? If so, what can you say about the 
point A? 


§5. THE CONTRACTION THEOREM 

Many mathematical problems may be stated in the form a = Tx, 
where the unknown a is some kind of mathematical object (perhaps, but 
not necessarily, a real number), while Tx denotes the result of carrying 
out some operation on a. We deliberately content ourselves with this 
rather vague statement at this time. We shall first give a few simple 
illustrations and then proceed to develop a very general method which is 
often effective in solving such problems. 

(/) Let us begin with the extremely trivial algebraic problem a = 
Ja -f 5. We may think of T as being the operation of multiplying any 
real number by | and adding 5 to the result. If we guess that a = 10 is the 
solution (or at least a solution), we can confirm the correctness of this 
guess by replacing .v on both sides of the equation by 10 and observing 
that we obtain 10 = 10. On the other hand, if we guess that a = 11 is a 
solution, we shall obtain the false statement 11 = 10.5, when we replace 
A on both sides by 1 1 . However, if we replace a by 1 1 only on the right 
side, we find that the right side reduces to 10.5, and this may suggest trying 
10.5 as a new (and hopefully better) guess. Indeed, the right side then 
becomes 10.25, and by repeatedly replacing a on the right side by the 
preceding numerical result, it is quite clear that we obtain a sequence 
of numbers which converges to 10, which is indeed a solution (in fact, the 
only solution) of the problem. The reader should have no difficulty in 
showing that any initial guess, no matter how far from the number 10, 



5. THE CONTRACTION THEOREM il 


will give rise, when the preceding procedure is followed, to a sequence of 
“improved guesses" which converges to 10. 

(//) Suppose we convert the preceding problem into the entirely 
equivalent form .r = 2a* — 10. It is not difficult to see that if we employ 
any initial guess other than 10, the succeeding guesses will get worse 
rather than better; for example, if we choose 10.01 as the initial guess, the 
succeeding guesses will form the sequence 10.02, 10.04, 10.08, 10.16, and 
so forth. Quite clearly the factor 2 appearing before the a on the right 
side is responsible for doubling the error at each successive stage, while the 
factor \ appearing in the earlier formulation is responsible for halving the 
error at each successive stage. 

(//7) Let us write the equation a^ — 3a -f 2 == 0 (which has the two 
solutions A = 1 , 2) in the form a = a*/3 -f §. If we choose as an initial 
guess A = 1.5 and substitute into the right side, we obtain the succeeding 
guess A = ! I, which is smaller than the initial guess. By inserting the new 
value into the right side, we obtain a still smaller guess, and it is not 
difficult to see that by repeating this procedure indefinitely we obtain a 
sequence which converges to 1 , which is indeed one of the two solutions 
of the given problem. On the other hand, if we choose as the initial guess 
the number 3, the resulting sequence will increase without bound. Finally, 
we remark that, unless the initial guess is exactly 2 or —2, the resulting 
sequence will certainly not converge to the solution 2. (Cf. Exercise 4.) 
Thus, in contrast to example (/), where the method described always 
succeeds, and in contrast to example (77), where the method described 
always fails unless the initial guess is exactly correct, we have in the present 
situation a much more delicate state of affairs. 

These three simple problems should suffice to give the reader an idea 
of the kind of theorem we are aiming at. 


Theorem I (Contraction Theorem) : Let M he a complete metric 
space and let T be a function from M into M : i.e.,for every x in M, Tx is a 
uniquely determined member of M. Suppose that T is contracting: i.e,, 
there exists a fixed positive number k less than unity such that, for any 
elements a, y of M, the inequality p(Tx, Ty) < kp(x,y) holds. Then the 
equation x == Tx possesses a unique solution in M , and it can he obtained by 
iteration, or successive approximation, beginning with any element y cf M, 
By this we mean that, no matter how y is chosen in M, the sequence* y, Ty, 
T^y, Ty , . . . converges to the unique solution of the preceding equation. 

Proof: First we establish that the given problem has at mc^st one 
solution. Suppose, on the contrary, that Aj = Txi and Ag = Tx^, where 
Ai 9^ Aj. Then 0 < p{xi, x^) = p(Txi, TXi) < kp(xi, Ag), and so 

(1 - k)p{xi, AgX 0; 


♦ Of course, means T{T^y), and so forth. 



12 METRIC SPACES 


this is impossible, since the left side of the inequality is the product of two 
positive numbers. Thus the solution, if any exists, is unique. 

Now, we observe that p{T% 'Pg) = piT{Tf\ T(Tg)) < kp{Tf, Tg) < 
^ S)^ obvious induction we obtain p(T^f, T^g) < k^pif, g) 

for any positive integer n. If 0 < m < n, we then obtain, from the 
triangle inequality, 


piT’^y, T"y) < piT’^y, T'^^y) + piT’^^'y, T”'^^y) + ■■■ + p(T”-^y, r» 
< k'^piy, Ty) + Ty) + --- + k”-^p(y, Ty) 


rm __ i^n il 

^piy^TyX 


1 - k 


piy^ Ty). 


Given e > 0, we can therefore choose Nie) so large that whenever N < 
m < « the inequality piT'^y, T^y) < e holds. Thus, the sequence y, Ty, 
Ty, ... is Cauchy, and, since M is complete,* by hypothesis, this sequence 
must converge to some element x. Furthermore, for any n, p{x, Tx) < 
p(x, Ty) + p{Ty, T^^y) + piT'^ y, Tx) < p{x, Ty) + k^p{y, Ty) + 
kp{x, Ty). As n -> 00 , all three terms on the right approach zero, and 
since the left side of the inequality, namely p{x, Tx), is independent of n, 
it follows that p{x, Tx) = 0, or x = Tx. The proof is thus complete. 

For obvious reasons, the element x whose existence has just been 
demonstrated is called a fixed point of the function, or operator, T. Much 
of modern analysis (particularly non-linear analysis) is devoted to the 
demonstration of the existence of fixed points of functions defined on 
various kinds of spaces (often, but not always, metric spaces). The 
preceding theorem, simple though it may appear, serves as a prototype of 
much more difficult theorems; furthermore, as we shall illustrate in the 
next section, it is itself a tool of great usefulness and power. 

Exercises 

1. Show, by means of a simple example, that it does not suffice, in 

the contraction theorem, to replace the hypothesis ^ < 1 by 
the weaker hypothesis it < 1 , nor does it suffice to assume that 
p(Tx, Ty) < p(x,y) whenever x y. (That is to say, the 
ratio p(Tx, Ty)lp(x,y) must be bounded away from unity as 
X and y range over M, nor merely less than unity.) 

2. Show that some power of T may be contracting even though T 

itself is not contracting. 


* It may appear that the hypothesis of completeness is employed simply in order 
to enable us to push through the proof. However, Exercise 6 shows that this hypothesis 
is indispensable. 



APPLICATION OF THE CONTRACTION THEOREM 13 


3. (Continuation of Exercise 2). Suppose that is contracting for 

some integer p > I ; as before, we assume that M is complete. 
Show that the sequence >% T)\ T^y\ . . . converges, regardless 
of the choice of and that the limit of the sequence, v, is a 
fixed point of T (not merely of P’); furthermore, show that 
T has only one fixed point, even though T is mf assumed to be 
contracting. 

4. Prove the assertion made in example (m) to the effect that, except 

for two particular initial guesses, the solution .v = 2 will never 
be obtained by the iterative procedure. 

5. Let M be the subset [0, 1.4] of R and let Tx = a*/ 3 *+■ | for every 

v in M. Show that all hypotheses of the contraction theorem 
are satisfied, so that the equation — 3.v -h 2 = 0 has 
exactly one solution in M. 

6. Let M be the metric space consisting of all rational numbers in the 

closed interval [|, ll] (with the metric derived from R), and let 
Tx = .V "f (2 — .v'-^j/lO. Show that all the hypotheses of the 
contraction theorem except the completeness of yV/ are 
satisfied and that T has no fixed point. 


§6. APPLICATION OF THE CONTRACTION 
THEOREM TO DIFFERENTIAL EQUATIONS 

In this section we give one application of major significance of the 
contraction theorem. The basic problem of the entire theory of ordinary 
differential equations is the following: Given a function /(a,/) defined 
in some domain D (open, connected, non-empty set) of the plane and 
given an interior point (a'o, >’o) there exist a solution of the 

differential equation clyjJx = f(x,y) passing through this point — that is, 
a differentiable function g^(A') defined in some interval / = [c, d] containing 
Xq in its interior such that gixf^) = /o dgjdx = / ( y,^(a')) for all x in 
the interval /? Furthermore, if such a solution exists, is it unique? 
Simple examples (such as f(x,y) = 0, j(x^y) == I, f{x,y) = 1 -f 
where the problem can be solved by elementary techniques, suggest that 
the answer to both questions is affirmative. On the other hand, it is 
obvious that some reasonable hypotheses must be imposed in order to 
guarantee the existence of a solution; for example, if /(v,/) equals 0 
whenever x is rational and equals 1 whenever .v is irrational, it is clear that 
no solution exists in any interval, however small. This simple example 
suggests that some hypothesis involving continuity of J(x,y) should be 



14 METHIC SPACES 


imposed. On the other hand, the case fix,y) = 2 \y\^l^ shows that mere 
continuity is insufficient to guarantee uniqueness. (Cf. Exercise 1.) 

The following theorem, while not the strongest of its type that is 
known, plays a vital role in the theory of ordinary differential equations. 
While it should be part of the analytical arsenal of every mathematician, 
we present it here as an excellent illustration of how the contraction theorem 
can often be used to establish, on the one hand, both existence and 
uniqueness of a solution to a given problem and to provide, on the other 
hand, a constructive procedure for obtaining this solution. 


Theorem I (Cauchy-Picard): Let f{x,y) he reahvalued and 
continuous in some domain D containing the point (aq, suppose that 

^here K is a positive constant* 
whenever (a:, j/j) and {x,y2) belong to D. Then there exists some interval 
/ = [xq — fl, A'o -h a] and a function g{x) defined and continuously differ- 
entiable in this interval and satisfying the conditions ;^(ao) = >’o ^ad dgjdx = 
f(x, g(x)). Furthermore^ there is only one function satisfying all these 
conditions. 


Proof: If the function g satisfies the given differential equation 
and the condition ^(a^o) = find by integration that g must satisfy 

the integral relation 


g(-x) = + f /(f, g(f))c/f; 

•la*,, 

conversely, if g satisfies this relation it must satisfy both the given differ- 
ential equation and the condition gixo) — Vo- Thus, the preceding 
integral relation is entirely equivalent to the given differential equation 
together with the prescribed initial condition. This fact suggests very 
strongly that the function g should be thought of as a fixed point (in a 
suitable space of functions) of a certain integral operator. We now proceed 
to follow up this idea in detail. 

First, choose numbers a, (i which are positive but so small that the 
closed set S defined by the inequalities |a' — AqI < a, \y — Vol < is 
contained in £), and let m = rnax^^ ^je,. I/(A^ /)|. Then let 


a = min {a, ^/m, 2/(3/^)} 


* A function /(a, y) satisfying such a condition is said to satisfy a Lipschitz con- 
dition^ or to be Lipschitz continuous, and any number K which can be employed in the 
preceding inequality is termed a Lipschitz constant of f{x,y\. 



6. AI»f»UCAT10N OF THE CONTIIACTION THEORiM IS 


and let Si be the closed subset of D defined by the inequalities \x -« x^\ < a, 
\y — ^ol < (Clearly, Si c s, and |/(x,^)| < m everywhere in Si.) 
Now let M be the class of all real-valued continuous functions defined on 
[xo — a, Xq 4* a] whose range* is in the interval (jr© >^0 + and let 
the distance between any two members of A/ be defined by the equation 
P(L S) = I f(x) — ^(a*)|. M is easily shown to be a complete 

metric space. (Cf. Exercise 2.) 

Next, we define Tg, for every as follows: Tg « h, where 

h{x) = >^0 4- g(h) t X € [a'o - a, Xq 4- a]. We note that this 

definition is meaningful, for /(f , g(^)) is continuous; hence the integral is 
well defined and depends continuously on .x% so that h{x) is continuous on 
[xq — a, Xq + a]. Furthermore, if gi and g^ are any two members of A#, 
we have 

j>’o + J ) - [>'0 + «»(f)) rff] 

= gxm-m g*(f))} ds, 

Jro 

and so 

p(Tgu Tg,) < max g,($)) - /{f, g,{S))} d{ 

< max [ K |g,(|) - gj(f)| df 

|x -*ol *'va 

< ' p(gi^ gz) < ipigu gz)’ 

Finally, we observe that 


\Kx) - yo\ < 



<ma < p. 


so that Tg belongs to M whenever g does; in other words, T maps M 
into M. Thus, we are assured by the contraction theorem that there 
exists one and only one member g of M satisfying the equation g = Tg, 
which is identical with the preceding integral relation. 

Finally, it is evident that any function g(x) satisfying in the interval 
[xq a, Xq a] the differential equation dyidx ^ f{x,y) and the 
condition g(Ao) = must be a member of M and must satisfy g = Tg, 
Hence the proof of the entire theorem is complete. 


* We remind the reader that the range of a function is the set of all values of the 
function; that is, the object y belongs to the range of the function /iff there exists an 
object X such that f(x) is defined and equals y. 



f6 METRIC SPACES 


Exercises 

1. Show that the difTerential equation dyjdx = 2 is satisfied by 

the functions gi(x) == 0, gsC^) = {0, x |x|}, 

and j^ 4 (x) = min {0, x |x|} and that all four of these functions 
satisfy the initial condition ^*.(0) = 0. 

2. Prove that C([a, /?]), as defined in §2, is a complete metric space; 

furthermore, prove that the subset of C{[a,b]) consisting of 
functions satisfying the inequalities c ^f(x) ^ d for all x e 
[a, h] is also a complete metric space, is this still true if we 
replace c < /(x) < J by c < /(x) < dl 


§7. THE BAIRE CATEGORY THEOREM 

At the end of §3 we defined a subset S of a metric space M to be 
dense if 5 = A/. It should be immediately evident (cf. Exercise 1) that the 
following definition is entirely equivalent: The subset S is dense if for 
every non-empty open subset G of M the intersection 5 n G is non-empty. 
In other words, a dense subset of M penetrates every non-empty open 
subset of M. It may be worth pointing out that it is possible for both S 
and its complement to be dense; for example, both Q and R Q 
dense subsets of R. 

We now define a subset T of a metric space M to be nowhere dense 
iff T contains no open ball; this, in turn, is clearly equivalent to the 
assertion that the interior of T is empty. We state the three following 
theorems, leaving the proofs to the reader as Exercise 2. 

Theorem I : T is nowhere dense iff T is nowhere dense. 

Theorem 2: A closed set T is nowhere dense iff is dense. 

Theorem 3: T is nowhere dense iff every non-empty open subset G 
of M possesses a non-empty open subset € such that T r\ G 

As trivial examples, we point out that any finite subset of R, and also 
the set of all integers, are nowhere dense subsets of R. A more interesting 
example is the following: Let M be the closed interval [0, 1], with the 
metric obtained from R, and let us enumerate* the rational numbers in the 
interior of M. We thus obtain the sequence r^, rg, rg, . . . . Cover each 


♦ We assume that the reader is acquainted with the fact that the set of rational 
numbers in any interval, or even in all of R, can be put in one-to-one correspondence 
with the natural numbers 1, 2, 3, ... . 



7. THE EAIIIE CATECOAY THEOAEH 17 


with an open interval contained in the interior of M and having length 
The union S of these intervals is certainly a dense open subset of 
M, and so the complement, Af — 5, is nowhere dense. It is quite remark- 
able that the measure of S is less than -h -b * * * , or jV* so that 
M — S possesses measure exceeding (The precise meaning of this 
sentence will become clear to the reader, if he is not yet acquainted with 
measure theory, in the following chapter.) Thus, although M ~ S, in 
some reasonable sense, consists of most of it is nowhere dense. In 
geometrical terms, the reason that M — S is nowhere dense is that 
although having small measure, is formed from open intervals which arc 
thoroughly scattered over the interval A/. 

The following theorem appears rather negative in spirit; it asserts 
that something cannot be done. Nevertheless, it is one of the most useful 
and powerful theorems in the theory of metric spaces. In the following 
section we shall give one extremely interesting application of this theorem, 
and it will also play a vital role in later developments. 

Theorem 4 (Baire Category Theorem): A complete metric 
space cannot he expressed as the union of countably many nowhere dense 
subsets. 

f^ROor: Suppose that the complete metric space M can be expressed 
‘ts 1 where each .4^^ is nowhere dense. Then we can find a non- 

empty open set which contains no point in common with A^. Let .Xj 
be any point of 6\, and let be chosen so small that 9 (7,. 

Since Sffxj) is a non-empty open set we can find a non-empty open subset 
G 2 of 5 ^j(a'i) which is disjoint from A^, and then we can choose a point 
a '2 in Go and a positive number €2 less than Je, and so small that 5*^ JA'2] 9. 
G’2- Continuing in this manner we obtain a sequence of nested closed 
balls with approaching zero. By Exercise 4-12 the diameters 

of these balls also approach zero. Since M is complete, the Cantor 
intersection theorem assures us that there exists one (and only one) point 
contained in all the sets and hence in none of the sets A,^. Thus, 

contrary to assumption, M 5^ (J/Tl 1 ^^d so the proof is complete. 

The explanation of the rather undescriptive title of this theorem is 
the following : A metric space is said to be of first category if it can be 
expressed as a union of countably many nowhere dense sets; otherwise, 
it is of second category. Thus, the theorem which we have just proven 
can be stated in the following concise form: A complete metric space is of 
second category. 

Exercises 

1. Prove the assertion made in the second sentence of this section. 

2. Prove the three theorems stated at the end of the second paragraph 

of this section. 



IS METRIC SPACES 


3. Prove that the word “complete” may not be omitted in the 
statement of the Baire category theorem. 


§8. THE EXISTENCE OF A CONTINUOUS 
NOWHERE-DIFFERENTIABLE FUNCTION 

The functions that one usually encounters in elementary real analysis 
possess derivatives for all values of the independent variable, with the 
possible exception of a certain set of values which are rare in some sense. 
This led to the conviction during the early stages of the development of 
the differential calculus that a continuous function, except perhaps on a 
rare subset, is everywhere differentiable. We shall now show, by a 
suitable application of the Baire category theorem, that there exist 
continuous real-valued functions (defined on all of R) which are non- 
differentiable every where\ It should be recalled, for clarity, that the 
function f is said to be differentiable at the point x of if / is defined 
throughout some open interval containing x and if 

h 

exists finitely; equivalently, there exists a number a (necessarily unique) 
such that \f(x + /z) — /(.v) — a/t| = o{h), where o{h) denotes a function 
of h such that o{h)lh -> 0 as /z 0. 

Let P be the collection of all real-valued continuous functions 
defined on R and possessing period one. If we define the distance between 
two members of P by the equation 


p(/. s) = max |/(.v) - .?(.y)| / = max |/(.v) - g{.x)\\, 

sceR \ icelO,!] / 

P clearly becomes a complete metric space. We shall find it convenient to 
employ the notation ||/|| instead of p(/, 0) (where 0 here denotes the 
function which vanishes identically on /?), so that |i/— gll = p(fig)- 
We shall use the following two facts about P: 

(a) Given any positive number e (however small) and any positive 
number k (however large), there exists in P a member / which satisfies 
the inequality ||/|| < € and which possesses everywhere a left-hand and a 
right-hand derivative which are finite but greater in absolute value than k. 
(Remember that if / possesses at .y finite left-hand and right-hand de- 
rivatives, then / is differentiable at x iff the values of the left-hand and 
right-hand derivatives coincide.) Rather than giving a formal proof of 



a. A CONTINUOUS NOWHERE-DIFPCRENTIABLE FUNCTION if 


this assertion, we suggest that the reader think of a saw whose cutting 
edge consists of many short but steep teeth. 

(h) Given any f 'mP and any positive number c, there exists in P a 
function g which is continuously differentiable everywhere in /?and whose 
distance from / is less than €. (That is, the continuously differentiable 
members of P form a dense subset of P,) Without going into all the 
details, we sketch the proof as follows: From a basic theorem of real 
analysis, the function /, being continuous, can be approximated within 
e/3 everywhere on [0, 1] by a polygonal function jgi, a continuous function 
whose graph in this interval consists of a finite number of straight line 
segments. If ,gi( 0 ) gi(l), it is possible to find another polygonal 
function such that ,if 2 ( 0 ) = ^^ 2 ( 1 ) J^rid such that — ,tfi(.v)| < c/3, 

and hence < 2e/3, everywhere on [0, 1 ]. The latter in- 

equality persists if ga i^ defined everywhere on R by the periodicity 
condition + D = Finally, the corners appearing in the graph 

of be rounded off to furnish a periodic continuously differentiable 

function g such that •— j^ 2 (.v)| < c/3, and hence \g{x) -“./ (.v)! < c, 
everywhere. 

Now for every positive integer n we define a subset £„ of P in the 
following manner: The function /(belonging to F) is contained in £„ iff 
there exists at least one value of v such that |(/(.v -f /;) — /(.v))//?| < n 
for all h in the open interval (0, Ijn). Note carefully that if /has a finite 
right-hand derivative /' at even one value o/ .v, / belongs to some 
(For example, if //(.Vo) = —27.3, then |(/(.Vo + h) — /(A'(,))//t| < 28 for 
all sufficiently small positive It -not necessarily for 0 < /? < but, let 
us say, for 0 < // < « Then clearly ft. £’^, 73 .) In order not to break 
the main line of the argument, let us accept momentarily the fact that 
each £„ is a closed and nowhere dense subset of P. Since P is complete, 
u:_, E„^p (by the Baire category theorem), and so there exists an 
element of P (that is, a periodic continuous function) which is outside 
every E,,, This function does not have a finite right-hand derivative 
anywhere, and therefore it is certainly nowhere differentiable. 

It now remains to justify the previous assertions that each £„ is closed 
and that each £,, is nowhere dense. (Actually, onlyahe second assertion 
is needed in order to justify the preceding proof, but we shall use the 
first in proving the second.) 

(a) Suppose that £„ is not closed. Then there exists a member f of P 
which is not in E„ and there exists a sequence f^fz, ... of members of £„ 
such that 11/ — ff,\\ - > 0 as k > -f 00 . For each k we can choose a point 
Xj^ somewhere in [0, 1 ] such that |(/^(a';i. 4- h) — fk{xjc^)l^^\ ^ ^ whenever 
0 < /j < Ijn. By the Bolzano-Weierstrass theorem we can select a 
subsequence of the /*’s in such a manner that the corresponding 
converge to some point, in [0, 1]. Call this subsequence of functions 
/i»/ 2 » • • • the corresponding subsequence of the x^'s 
Then || / — /*|1 -> 0 and Sk f as £ -f 00. Choose any number h in 



22 METRIC SPACES 


It is a most significant and interesting fact that the two given equivalent 
definitions of continuity are, in turn, equivalent to a definition in which 
only the concept of an open set is involved. (This fact is employed to 
provide the definition of continuity when mappings between topological 
spaces are considered.) 

Theorem i: The function f mapping the metric space M into the 
metric space N is continuous at the point {of M) iff the inverse image* 
of every neighborhood'^ of /(.Vq) contains a neighborhood of The function 

f is continuous iff the inverse image of every open subset of N is an open 
subset of M. 

Proof; Suppose / is continuous at Xq. Then, according to the 
preceding definition, the inverse image of the open ball 5V(/(.Vo)) contains 
the open ball ‘S'^(xo)» where e is any given positive number and the positive 
number d is chosen appropriately. Since every neighborhood of /(A q) 
contains some open ball 5t(/ (Aq)), the inverse image of any neighborhood 
of /(xo) must contain a neighborhood of a*o. Conversely, suppose that 
the inverse image of every neighborhood of / ( vo) contains a neighborhood 
of XqI then, in particular, the inverse image of S,(f(xo)) contains a 
neighborhood of Aq, and a neighborhood of a^ must contain 5^(.Vo) for 
sufficiently small d. This completes the proof of the first part of the 
theorem. 

The proof of the second half is an easy extension of the foregoing 
argument, and is left to the reader as Exercise I. 

It should be emphasized that for a given e the choice of d usually 
depends on the point a'^; in general it is not possible to choose a single d 
which will suffice at each point of M. This remark leads to the following 
definition. 

Definition 2: The function f {mapping M into N) is said to be 
uniformly continuous if for every positive number e it is possible to find a 
number b {depending only on c and the function f) such that p{f (a'),/(.Yo)) < 
e whenever p{x, Aq) < b. 


* If / is a function mapping M into N (where M and N are arbitrary non-empty 
sets, not necessarily metric spaces) and if A is any subset of M, then /(/<), the image of 
A (under the function /) is the subset of N consisting of all objects y such that v = fix) 
for at least one member x of A. Conversely, if B is any subset of N, then f HB), the 
inverse image of B, is the subset of M consisting of all objects x in M such that /(;c) e B. 
Exercise 6 lists the most important facts about images and inverse images. 

t A neighborhood of a point Xo in a metric space M is an open subset of M which 
contains x®. Some authors use the term “neighborhood of Xo” to mean a subset A of 
Af, not necessarily open, such that there exists an open set G satisfying the conditions 
Xj) G C ^ A, 



9 , CONTINUOUS FUNCTIONS 21 


It is one of the most important theorems of real analysis that a con- 
tinuous real-valued function defined on a bounded closed subset of R is 
uniformly continuous. In §10 we shall discuss certain ideas which enable 
us to generalize this result. (Cf. Exercise 10-5.) 


Exercises 

1. Complete the proof of Theorem 1. 

2. (a) Let /(a) = v" for every real number v. Show that the function 

/is continuous, but not uniformly continuous. On the other 
hand, show that the restriction of / to any bounded interval 
(open, closed, or half-open) is indeed uniformly continuous. 
(/)) Let / (v) = v/(i + a'-^) for every real number v. Show that /is 
uniformly continuous. 

3. Let / and /^ be continuous real-valued functions defined on the 

metric space A/, and let a and /> be any real numbers. Prove 
that the function «/ -h is also continuous, and that if /, 
and /a are both uniformly continuous, a/ + /i/^ is also uni- 
formly continuous. Also, prove that the product/,/], is con- 
tinuous and that if / never assumes the value zero the quotient 
/ 1//2 is also continuous. 

4. (r/) Let /i,/ 2 ./ 3 , • - be a sequence of continuous functions 

mapping the metric space M into the metric space A, and 
suppose that lim„ , , /,(') exists for every point .v of Af. Show 
by an example that the limit function may be discontinuous. 
(/:>) Suppose that the sequence appearing in part (a) is uniformly 
convergent -i.c., that for every positive number e there exists 
an integer A(c), independent of .v, such that 
lim„...,, /j( v)) < € whenever k > NU), for all points x in M. 
Prove that in this case the limit function is certainly con- 
tinuous. 

(c) Demonstrate by means of an example that the limit of the 
sequence of continuous functions may be 

continuous even though the sequence may fail to converge 
uniformly. (For convenience, consider functions mapping R 
into /?, i.e., choose M ~ N ^ R.) 

5. (a) Let M be a metric space, A? a non-empty subset of M\ and x 

a point of M. Then p(x, A/), the distance from x to A?, h 
defined as inf^ef, /3(x, j). Prove that p{x, lif) = 0 iff x e M. 
(Recall that A? denotes the closure of A/.) Also show that 
p(.r, A?) is, for fixed A/, a continuous function from M into R. 

{h) Let Ml and M 2 be non-empty, closed, disjoint subsets of the metric 
space M. Prove that there exists a continuous real-valued 



24 METRIC SPACES 


function/ defined on M which satisfies the following conditions : 
j\x) = 6 if A' 6 My, /(a) = 1 if A' G M2. 0 < /(a) < 1 if 
A ^ Ml u M2. (A similar theorem holds true in topological 
spaces possessing the property called normality; this theorem, 
known as IJrysohn’s lemma, is one of the most important 
results in general topology.) 

(c) Let My and M2 be non-empty subsets of a metric space. The 
distance between these subsets, denoted piMy, M2), is defined 
(Obviously, if My and M2 are not 
disjoint the distance between them is zero.) Show that there 
exist two non-empty, closed, disjoint subsets, My and M2, of 
R such that p(My, M2) — 0. 

6. Let / be a function mapping the set M into the set N, let {AJ 
denote any collection of subsets of A/, and let {Bp} denote any 
collection of subsets of TV. Prove the following equalities and 
inclusions: 

(/) If then Ji\JA^) = [JJ\A^); 

(//) g 5,.^, then/ / HU Bp) ^ \J f ^Bp); 

/ Hn = n/ 

Show by a simple example that the inclusion relation appearing 
in the last part of (/) cannot be replaced by equality. 

§10. COMPACTNESS 

The property of compactness is one of the most significant that arises 
in the study of metric spaces (and, more generally, in the study of topo- 
logical spaces). We shall discuss this important concept only very briefly 
in this section, leaving the proof of the most important result to the 
reader in the form of a chain of exercises. 

We begin by recalling the Heine-Borel theore.m, which should be 
familiar to the reader from his study of the fundamentals of real analysis; 

Let A be a non-empty, bounded, closed subset of /?, and let {G^} be a 
collection of open subsets of R such that A \J Then it is possible 
to select a finite number of G^’s, say . . . , G^^^, such that A g 

<^0*- Also, from every sequence a'^, Ag, A3, ... in /t it is possible to 
extract a subsequence which converges to a member of A, Conversely, if 
either of the preceding conclusions holds true, so does the other, and A 
must be closed and bounded.* 

* Strictly speaking, the Heine-Borel theorem consists only of the first two sentences 
of this paragraph. The third sentence follows easily from the Bolzano-Wcierstrass 
theorem and the definition of a closed set, while the proof of the last sentence is quite 
elementary. We remind the reader of the fact that the Heine-Borel theorem is the 
essential tool needed in proving (among other things) that a continuous real-valued 
(or complex-valued) function defined on a bounded closed subset of R is uniformly 
continuous. 



10. COHrACTNSSS 2$ 


We now proceed to present a number of definitions which provide the 
basis for suitably extending the Heine-Borel theorem to a certain class of 
metric spaces. 

Definition I: An open covering of a metric space M is a collection 
i^a) of Open subsets of M whose union is A/; that is, each point of Af is 
contained in at least one of the 6','s. 

Definition 2: A metric space M is said to he compact if from every 
open covering of M it is possible to extract a finite subcollection of the 
G\'s which constitute an open covering of M. A subset N of the metric space 
M is said to be compact if N is either empty or is compact when considered 
as a metric space in its own right {with the metric derived from M). 

Definition 3: A metric space is said to he locally compact if each 
point has a neighborhood whose closure is compact. 

Definition 4; A metric space M is said to be sequentially compact 
if from every sequence ,\i, .V 3 , . . . in M it is possible to extract a sub- 

sequence which is convergent. 

Definition 5: A B-W { Bolzano- Weier.strass) space is a metric space 
having the property that every infinite .suh.set has at least one limit-point 
(which need not belong to the subset). 

Definition 6: A metric space M is .said to be totally bounded if for 
every positive number € there exists a finite collection of points 

{.X'j, A 2, . . . , A^J 

such that the open balls {.S<(a'j), (A2), . . . , ^S<(a',J} constitute an open 

covering of M. 

Theorem I : If a metric space posses.s'es any one of the following 
four properties, it possesses the other three, so that all four properties are 
equivalent: 

(a) compactness, 

(h) sequential compactness, 

(c) B-W property, 

(d) completeness and total boundedness. 

The proof is contained in Exercises* 9 to 15. The reader will certainly 

♦ Although the reader is strongly urged to try to prove these results without 
referring to outside sources, it may be mentioned that an excellent presentation of the 
proof of Theorem 1 will be found in (Simmons, pp. 121 125]. 



26 METRIC SPACES 


find it helpful to keep in mind, while working these problems, the proof 
from real analysis that a subset of R is compact iff it is closed and 
bounded. 

Now let 71/ be a compact metric space and let C{M) be the class of all 
continuous real-valued functions defined on M. From Exercises 9-3 and 
10-4 it is easily seen that C{M) becomes a metric space if the distance 
between any two members / and g of C(M) is defined by the equation 
== l/W Exercise 9-4 it becomes 

evident that C(M), when provided with this metric, is complete. Clearly 
C{M) is not compact, for it is not bounded* (simply consider the constant 
functions!), and hence certainly not totally bounded (cf. Exercise 3). 
This remark suggests the problem of characterizing the compact subsets of 
C{M). By referring to condition (d) of Theorem 1 and recalling (cf. 
Exercise 4-2) that a (non-empty) subset of a complete metric space is 
complete iff it is closed, we see that it remains only to obtain a suitable 
description of total boundedness in C{M). For this purpose we introduce 
the concept of equicontinuity. 

Definition 7: A (non-empty) collection of functions {f} from a 
metric space M into a metric space N is said to he equicontinuous if for 
every positive number e there exists a positive number d such that, for every 
index a, p{ffx)Jfy)) < € whenever p(x,y) < 6. 

The following theorem is now an easy consequence of the preceding 
definition and the preceding theorem. 

Theorem 2 (Ascoli-Arzela): A (non-empty) subset of C(M), 
w here M is a compact metric space, is compact iff it is closed, bounded, and 
equicontinuous. 

Exercises 

1. Show that a compact subset of a metric space must be closed and 

that a closed subset of a compact metric space is compact. 

2. Show that the metric space defined in Example (b) of §2 is 

bounded, but that it is not totally bounded if it contains 
infinitely many points. Show directly that in the latter case 
the space is not compact. 

3. Show that a totally bounded metric space is bounded. (The 

preceding exercise shows that the converse is not true.) 

4. Let /be a continuous function mapping the metric space M into 

the metric space N. Show that if M' is a compact subset of M, 
then f(M') is compact. (In particular, by choosing N = R, 

♦ A metric space is said to be bounded if its diameter is finite. 



10, COHMCTNESS 27 


we see that a continuous real-valued function defined on a 
compact metric space is bounded and attains its sup and inf.) 

5. Prove that a continuous function from a compact metric space 

into a metric space (not necessarily compact) is uniformly 
continuous, 

6. Show that, although each of the functions sin nx, ft ^ 1 , 2, 3, , 

is uniformly continuous on the interval [0, In], the collection 
of all these functions is not equicontinuous. 

7. Let m be a positive constant, and let be the subset of C([a, />]) 

consisting of all functions possessing at each point of [u, />] 
a derivative not exceeding m in absolute value. Show that 
this collection of functions is equicontinuous. 

8. Prove directly (i.c., by using standard techniques of real analysis) 

the following particular case of Theorem 2: Let {/,} be an 
infinite equicontinuous collection of real-valued functions 
deiined on the interval [a, /)]. From every uniformly bounded 
sequence drawn from this collection it is possible to extract 
a subsequence which converges uniformly on the specified 
interval. Furthermore, the condition of uniform boundedness 
may be replaced by the (seemingly) weaker condition that the 
chosen sequence should be bounded at any one point of the 
interval. 

9. Suppose that the metric space M contains an infinite subset M' 

from which it is impossible to extract a convergence sub- 
sequence. Show that each point of M' can be covered by an 
open set which contains no other point of M' and that each 
point of M — M' can be covered by an open set which 
contains no point of A/', and deduce from this result that 
compactness implies sequential compactness. 

10. Prove, with the aid of the preceding exercise, that sequential 

compactness is equivalent to the B-W property. (Hint: 
Recall the proof of the Bolzano-Weierstrass theorem on the 
real line.) 

11. Prove that compactness implies the kF property (and hence, by 

the preceding exercise, sequential compactness). 

12. Let the collection {GJ constitute an open covering of a sequen- 

tially compact metric space. Show that there exists a positive 
number d (called a Lebesgue number of the collection {G,}) 
such that whenever p{x,y) < d there exists at least one 
which contains both x and 



METRIC SPACES 


13. Use the preceding result to show that sequential compactness 

implies total boundedness. 

14. Prove that sequential compactness implies compactness. 

15. Prove that compactness is equivalent to completeness together 

with total boundedness. 

16. Let /i,/ 2 ,/ 3 , ... be a sequence of continuous real-valued functions 

defined on a compact space A/, suppose that the inequalities 
fi(x) </ 2 (x) < /aCv) < * • * hold at every point x of M, and 
suppose that this sequence of functions converges (everywhere 
on M) to a continuous function. Prove that the convergence 
is uniform. (This is Dini’s theorem of monotone convergence.) 



CHAPTER 2 


LEBESGUE MEASURE 
AND INTEGRATION 


§1. INTRODUCTORY REMARKS 

Almost all significant applications of the theory of linear spaces 
involve to a greater or lesser extent the concept of integration. The 
Riemann integral, useful though it is for many purposes, has proven to 
be quite inadequate in many problems of a somewhat sophisticated 
nature, and it has been largely superseded by the Lebesgue integral and 
extensions of the latter. 

The deficiencies of the Riemann integral can be roughly summed up 
in two brief statements: first, not enough functions are integrable; 
second, and more seriously, limiting operations often lead to insur- 
mountable difficulties. In particular, if the functions /i»/2, . . . ttre each 
integrable on a bounded closed interval [a, h] and if they converge 
everywhere in this interval to a function /, it is not always true that 

/„(x) dx =^jjO) dx- 

Three things can go wrong: (a) the limit on the left side may not exist; 
(b) even if this limit does exist, the function /may not be integrable, so 
that the right side may be meaningless; (c) even if both sides of the 

29 




3a LEBESCUE MEASURE AND INTEGRATION 


preceding equality exist, they may nevertheless be unequal. (Cf. Exercise 

1 .) 

In the early 1900 's H. Lebesgue presented a new theory which 
eliminates most of the deficiencies of the Riemann theory, and it has had 
a tremendous impact on the development of mathematics since that time. 
In this chapter we present a limited portion of Lebesgue’s theory, and in 
later chapters we shall see how this theory plays an important role in many 
parts of linear analysis. 

There are many ways to present Lebesgue’s theory, and it is not always 
easy to show that two different developments are actually equivalent. 
Roughly speaking, the various approaches fall into two main groups. In 
one group are the approaches in which measure comes first and integration 
comes second; in the second group the order is reversed. We shall present 
an approach of the first kind. 

Although vast generalizations of Lebesgue’s theory have been made, 
we shall confine attention in most of this chapter to the case that all sets 
under consideration are subsets of the real line R. (In the closing sections 
we shall indicate briefly how the Lebesgue theory is extended to integration 
over subsets of the plane and to integration over more abstract sets.) In 
order to avoid endless repetition of the word “open,” we make the 
convention that any set named O (with or without a subscript, such as 
Oi, 02 , O^, and so forth) is open (in terms of the metric of R). It must 
constantly be kept in mind that any open subset of R can be expressed 
in a unique manner (except for order) as the disjoint union of a finite or 
countably infinite collection of open intervals. These intervals are termed 
the components of the given set. (An interval need not be of finite extent; 
for example, the unique representation of /? — { 3 } as a disjoint union of 
open intervals is (— oo, 3 ) u ( 3 , -f oo).) For the benefit of the reader who 
may be unacquainted with the preceding fact concerning the structure of 
open subsets of /?, we present in Appendix E a brief proof. 

We shall find it necessary to deal with infinite series whose terms are 
either positive numbers or +00; since the reader may not be at ease with 
the extended real number system, we present in Appendix F a brief 
review of the theory of convergent series of real numbers and an indication 
of how the pertinent ideas have to be modified when one introduces the 
ideal number -f 00. The essential point for our purposes is the following 
fact: To every series -f ^^2 + ^3 + * * * » where each of the a's is a 
positive real number or -f 00, we can associate a sum which is also a 
positive real number or -f 00, and this sum is unaffected by any rearrange- 
ment of the order of the terms. 

Exercise 

1 . Illustrate each of the three unpleasant possibilities cited in the 
second paragraph. 



2. HiASUM OF OFFN SITS 31 


§2. MEASURE OF OPEN SETS 

Measure (more precisely, Lebesgue measure) is an extension of 
the concept of length. We begin by defining, in an obvious manner, the 
measure of open sets, and then we shall prove certain properties of the 
measure. (While the definition is obvious, the proofs of certain seemingly 
obvious properties require a considerable skill.) In the following section 
we shall extend the concept of measure and the proofs of its properties to 
sets which are not necessarily open, and then in the remaining sections of 
the main part of the chapter we shall define the Lebesgue integral and 
establish its most important properties. 

Before undertaking the systematic study of the theory developed in 
the present and following sections, the reader may find it possible to get 
an overall feeling for the subject by rapidly reading the definitions and the 
theorems several times. Hopefully, he will then see the general pattern 
of development and will not find the technical details such a stumbling- 
block as he might otherwise. 


Definition I : The measure of O, denoted is the sum of the 

lengths of the components of O, (For example, fi(R) = aj,/i((0, -f oo)) = 

-f 00 , //((1 , 3)) = 2, //((1 , 3) u (7, 1 D) = 6, p(9) - 0.) 

Note that this definition makes sense, because of the unique decom- 
position of O into components and the fact that the sum of the lengths of 
the components does not depend on the order of summation. Note also 
that if O is an open interval, piO) coincides with the length of O, Thus, 
measure is indeed a generalization of length. 

Theorem I: IfOi c then p(Oi) < //(O.^). 

Proof: Break up O2 into its components, C,, Cg, .... Each 
component of is clearly entirely contained in one of the Thus, for 
each Q we can determine the disjoint collection (perhaps empty, at most 
countably infinite) of components of O, contained in Q; we denote these 

components ^ particular k, the corresponding 

collection of F’s is empty or finite, then clearly 

if Q contains infinitely many components of Oy, the preceding inequality 
still holds, as is seen by enumerating these components, forming finite 



32 LEBESGUE MEASURE AND INTEGRATION 


sums,* and passing to the limit. If we now sum the preceding inequality 
over all values of k, we have on the left the sum of the lengths, in some 
order, of all components of while on the right we have the sum of the 
lengths, in some order, of all components of Og. Hence we have obtained 
the desired result, /i(0,) < ^(O^. 

Theorem 2; If Og, . . . are disjoint, then kj O 2 ^ * j = 
-f- -}-•••. 

Proof: Since the Ofs are disjoint, it is immediately seen that the 
components of Oj u ^ * are simply the components of the com- 

ponents of O2, . . , without duplication. Thus the sum on the right is 
simply a rearrangement of the sum on the left, and so the equality must 
hold. 

Theorem 3: Whether or not the sets . . . overlap, 

p(Oi u 6>2 ^ • • •) ^ ‘ • 

Proof: If the right side of the inequality is +00, there is nothing 
to prove. Therefore, we may assume that piOf -h piOz) + • • * is finite. 
Let Cl, C2, . . . be the components of C, u Cg u • • • , and let F, denote 
Oi n Cf^. Suppose that we can prove the inequality 


+ • • • > ( 2 - 1 ) 
Then by summing over k we would obtain 


2 "b “b ' * ^ (2-2) 

k k 

A moment’s thought will show that the left side of (2-2) is the summation, 
in some order, of the lengths of the individual components of the indi- 
vidual Ofc’s, while on the right side we have the summation, in some order, 
of the lengths of the components of Oj u Cg vj • • • . Thus (2-2) is equiv- 
alent to the assertion of the theorem. However, the inequality (2-1), 
which seems obvious, but isn’t, must be established. (See the Remark at 
the end of the proof.) 


* Here we employ the following fact: If the intervals . . . ,Jn are disjoint 
and arc all contained in the interval /, then the sum of the lengths of the i’s cannot 
exceed the length of I. It clearly does not matter whether the intervals are open, closed, 
or half-open. 

t Here and elsewhere the notation • • • implies that we are dealing with either a 
finite or countably infinite collection of sets. Also, keep in mind that expressions such 
as fi(Oi U O* u * • •) are meaningful, since any union of open sets is open. 



2. MEASURE OE OPEN SETS H 


Choose a particular Cf,‘, it is of finite length or infinite length* We 
shall consider only the former case, but the case of infinite length is 
disposed of by a simple extension of the argument which we now present. 
(In fact, it will become apparent, when the argument that is about to be 
presented is extended to the case of an interval of infinite length, that no 
such interval can exist when the hypothesis made in the second sentence 
of this proof is satisfied. Of course, this is exactly what one would expect.) 
Now suppose that (2-1) is false — that is, suppose that 


(2-3) 

F rom the definition of the C's and the I''s (a drawing may be helpful), it 
is evident that 

= (2~4) 

Since Q is an open interval of finite length, we may write C\ = (^i, />), 
so that (2 3) assumes the form 

//(I 1 ^.) *4- //(ro^.) a. (2-5) 

We can find a number a slightly larger than a and a number // slightly 
smaller than h so that 

//(Pi;:.) 4- //(IV) -b •••<//- (2-6) 

However, from (2 4) we observe that ]\ f, u u • • • 3 [a\ />'), and it 
now follows from the Heine-Borcl theorem that a finite number of open 
sets I'n-, • • • ^‘over [a\h']. Now, it is really obvious (and true!) 

that the sum of the lengths of this finite collection of P’s must exceed 

/?' — a'; and since this finite sum cannot exceed the left side of (2 6), we 

obtain 

> h' — a . ( 2 ” 7 ) 

Since (2-6) and (2 7) contradict each other, we conclude that (2-1) is 
true, and so the proof is complete. 

Remark: The subtlety that is often misunderstood or overlooked 
is the following: Might it be possible to cover an interval / by a countable 
collection of intervals {/j, . . .) whose lengths add up to less than the 

length of /? The preceding argument shows that the answer to this 
question is no, but observe that the proof requires use of the Heine-Borel 
theorem, and there is no way of avoiding it (or something equivalent to it). 
Thus, in this case, common sense furnishes the correct answer, but the 
reader hopefully realizes that common sense is often misleading. Exercise 
1 should help the reader gain a deeper understanding of the fact that the 
theorem which we have proven is very far from obvious. 



34 LEBESGUE MEASURE AND INTEGRATION 


Theorem 4 : fi(Pi kj O2) + ^ ^2) = M^i) + M^a)- (Roughly 

speaking, this asserts that when we add the measures of and O2 we 
count the overlapping part twice. The reader may find it helpful to think 
of Ox and O2 as clubs and p(Oi) and p( 0 ^ as the number of members,) 

Proof: Since this theorem is trivially true if either p{Ox) or p(0^ 
is -f 00 (cf. Theorem 1), we may assume that p(0^ and p{02) are both 
finite. 

(a) If Ox and O 2 are disjoint, Theorem 2 settles the matter. 

(b) Suppose that Oy and O2 are not disjoint, but suppose that Oy and 
O 2 each have a finite number of components and that each component of 
Ox is either disjoint from O 2 or coincides with a component of O^, and 
vice-versa. (For example, let Oj = (I, 2) u (3, 5) and let Og = (—7, 1) u 
(3, 5)u (8, 11).) Then 

p(Oj) = ^ /^(shared components) + ^(unshared components) 


and 


p{0^ = 2 iW(shared components) + 2 /^(unshared components). 
Adding, we clearly obtain 
H'iOi) + p(02) 

= (2 /^(shared components) + 2 unshared components)} 

-f 2 iW(shared components). 

The quantity in { } is clearly p{Ox ^ O^ and the last sum is clearly 

p(Ox r\ O 2 ); thus we have the desired equality. 

(c) Now we still assume that Oy and O2 overlap and that each of 
them has a finite number of components, but we drop the restriction 
imposed in (b) concerning the nature of the overlap. This means (a 
simple drawing may help) that the components of Oy are permitted to have 
endpoints which are interior points of O2, and conversely. Now simply 
remove from Oy those points (if any) which are endpoints of components 
of O 2 , and conversely. We then obtain sets Oy and Og which are clearly 
open (since only a finite number of points are removed), and these new 
sets satisfy the conditions imposed in (b). Thus 

+ p(d^ = p(dy \J 62 ) + piOy n da). (2-8) 

From the simple relationship between the sets Oy, Og, Oy, Og it is clear 
that Oy and Og may be replaced in (2-8) by Oy and dg, respectively, and 
so we have obtained once again the desired equality. 



2. HEASURi OF OfEN SETS 3S 


{d) Finally, suppose that at least one of the two given sets has 
infinitely many components. Given any c > 0, split into u and 
O 2 into u R^, where A^ and A 2 consist of a finite number of the com- 
ponents of Ox and while Ri and R^^ the remaining parts of Oi and Og, 
have measures less than c. (Note carefully that the finileness of /i{Oi) and 
//(Og) is needed at this point — think out carefully h7iv Ox and Og can be 
broken up in the indicated manner.) Then, using some of the earlier 
results, we obtain: (Convince yourself of the correctness of each step!!) 
fi{Ox ^ O 2 ) -f t^iOx n O 2 ) ~ ti(Ox) - MO 2 ) = fi((Ax u A 2 ) u (/?, u R 2 )) 4- 
fi{(Ax r\ A2) u (Ax n R2) ^ (A2 /?,) u (Rj, n R^)) ^ /i(Ax u /?,) — 

fiiA^ ^ R^) ^ i^{Ax ^ >^ 2 ) 4* /fiRi ^ R 2 ) *4“ /U(4i 4 2 ) 4* i^{Ax ^ R^) 4* 

fiiA^ ^Ri) 4- //(7?i n R^) - lii(Ai) - //(^g) < u A^) 4- /i(4^n .4a) - 
/^(^2)} 4" MRi) 4 ffiR^) 4“ f^R^) 4 /^(Ri) 4 f^(Ri) 
{fi(Ax ^ A 2 ) 4 i^{Ax A 2 ) — /4^x) 4- 5e. Now, by the previous 

parts of this proof, //(4j u A^) 4- //(^i n A 2 ) ~ /i(Aj) - t^Ai) = 0. 
Hence, //(0| u O 2 ) + /i(Oi n O^) — /fiOi) — /^Og) < 5c. The left side 
of this inequality has nothing to do with c, while the right side can be 
pushed as close as desired to zero. Hence, 


M(Ox u O 2 ) 4- MOx n Og) - ju(Ox) - //(Og) < 0. (2-9) 


On the other hand, using the same breakup as before, we obtain: 

/i(Oi O2) 4 /u(Oi n O2) - /fiOx) - /^(Oa) 

> /4Ax ^ A 2 ) 4 /^(Ax r\ A 2 ) - f^(Ax) - ti(Ri) - fiiA^) - //(/?a) 

> {/i(Ax u A2) -h fi(Ax r\ A2) - M^i) - - c - c 

= -~2€ (since { } = 0). 


Hence, arguing as before about c, we obtain 


M(Ox ^ O 2 ) + ti{Ox n O 2 ) - fJi(Ox) - ii(Ot) > 0. (2-10) 

Putting (2-9) and (2-10) together, we obtain the desired result. 


Remark: The details may appear bewildering, but the idea is 
extremely simple — namely, that any set O of finite measure is almost a 
union of a finite number of components. Once we have proven the theorem 
for open sets of this type, we merely have to show that the extra bits Ri 
and R 2 cannot destroy the equality. 



36 LEBESGUE MEASURE AND INTEGRATION 


Exercise 

\ . Let us define an open interval {a, b) in Q {not in R) as the set of 
all rational numbers satisfying the inequalities a < x <, b, 
where a and b are any real (not necessarily rational) numbers; 
also, let ya((^, b)) be defined sls b — a. Prove that the interval 
(a, h) can be expressed as the union of countably many 
intervals (oj, 6,), {a^, b^, . . . such that h)) < 

fx{{a, b)); in fact, given any positive number e, it is possible 
to choose the intervals (a^., bj^ in such a manner that 
Zfcli ^k)) Note carefully that this result does not 
contradict Theorem 3; it simply shows that Theorem 3, which 
is valid in R, is not valid in Q. (Hint: Exploit the fact that Q 
is countably infinite.) 


§3. MEASURE OF MORE GENERAL SETS 

In this section we shall assign a measure to a class of sets (which will 
be termed “measurable”) which contains all open sets; it will be necessary, 
of course, to demonstrate that the extended definition of measure agrees 
with the definition given in §2 when the extended definition is applied to 
an open set. 

Definition I : A set S is said to be measurable if for every positive 
number c there exist sets* O and O such that S ^ O, O — S Q O, and 
//(O) < €. {Note carefully that //(O) = +oo is allowed.) Roughly 
speaking, this says that S can he covered by an open set without much 
waste. 

Definition 2: if S is measurable, pjfS) is defined as inf //(O) for all 
sets O which contain S. The number pjfS) {which may be +oo) is called 
the Lebesgue measure of S, 

Remarks: {a) We use the symbol rather than fi because we have 
not yet proven that open sets are measurable. Of course, this will be 
proven very soon, as will the consistency of p, and — i.e., that p and pj^ 
coincide for every open set. 

{b) In some developments of the Lebesgue theory the quantity 
PifS) is associated with every set 5, whether or not S is measurable. 
(Note that the definition of Pl{S) is applicable to any set 5.) However, 
most of the following theorems are false if pj^ is not restricted to measur- 
able sets. When pf^ is defined unrestrictedly, it is termed the outer 


* For emphasis we recall our convention according to which O and O are under- 
stood to be open sets. 



1. MEASURE OF MORE GENERAL. SETS 37 


measure. One then also defines for each set an inner measure, and the 
measurable sets (as previously defined) arc those for which the inner and 
outer measures coincide. (Actually, this statement is true only when the 
outer measure is finite; a set S can have inner and outer measures both 
equal to 4-oo and yet be unmeasurable. Since we shall confine attention 
entirely to measurable sets, we shall have no need to consider outer and 
inner measure, and so we drop the discussion at this point.) 

Theorem I : Any finite or countably infinite set S is measurable, 
and its Lebesgue measure is zero. 

Proof: Given any positive number c, enumerate the points of S, 
cover the first point with an open interval of length less than €/2, the 
second point with an open interval of length less than f/2^, and so forth. 
The union of these intervals is an open set A, and /i(A) < ejl -f t/2^ -f 
• • • = c (by Theorem 2~3). Letting O ~ O == A and referring to the two 
preceding definitions, we see, first, that 5 is measurable, and, second, 
that fi/fS) < e; since e is arbitrarily small, fiifS) = 0. 

Theorem 2: If S and T are both measurable and if S 91 T, then 

Proof: Obvious from the definition of fij . 

Theorem 3: IfS is open it is measurable, and Pif S) = 

Proof: Referring to the definition of measurability, choose O ^ S 
and 0 = 0. This shows that S is measurable. Now, referring to the 
definition of (of a measurable set), we observe that S is one of the open 
sets that contain 5, and so inf^^^ p(0) cannot exceed p(S). Hence, 
PifS) < p{S), Now, if pifS) were less than p(S), this would imply the 
existence of an open set T containing S such that p{T) < p(S), and this 
would contradict Theorem 2-1. Hence, PifS) = piS), (Thus, as prom- 
ised previously, we have shown that pj^ and p are consistently defined for 
open sets. We may, therefore, henceforth denote the measure of any 
measurable set S by p{S), omitting the subscript L.) 

Theorem 4: //Sj, 52, . , .are each measurable, so is 5, vj Sg u * * • . 

Furthermore, p{Si u Sa + ^(^ 2 ) -f ' * ‘ • 

Proof: (a) Given c > 0, associate with each set a set and a 
set O, such that 5, c d„ p{d,) < e/2L Then, clearly, 

U e U U - U •S'fc ^ U O,, and piU O,) < c. Hence, (J 
is measurable. 



38 LEBESGUE MEASURE AND INTEGRATION 


(b) Given e > 0, choose O* (not necessarily the same O,, as in part (a)) 

such that + e/2* and e (?*. Then, by Theorem 2-3, 

MU < 2 MOk) < e + I! M^k)- Since U 0„ is an open set con- 
taining U it follows from the definition of measure that //((J Sj^) < 

^k)- Hence /i(U 5^) < € 4* 2 Since e can be chosen as 

small as we wish, we obtain /^(U 5*) < 2 M^k)- 

Theorem S: If O is open and e > 0 is given, there exists a closed 
set F such that F ^ O and p{0 — F) < e. (Note that O — F is open, and 
hence measurable,) 

Proof: (a) First suppose that p(0) < 4- oo and that O consists of 
a finite number of components, say O = (^i, bi) u (fla, u • * • u 
(a„, bj. Choose compact subintervals [a[, b[], [a'^, b '^], . . . , [n'^, b'J 
such that p((ay, b^) - [a[, b[]) < e/2, pifa^, b^ - [a'^, Z>;]) < e/22, and 
so forth, and let Fbe the union of these compact subintervals. Then Fis 
closed and O — F consists of In open intervals, {a^, a(), {b[, b^), {a^, a'), 
(^ 2 » ^ 2 )» • • • » and their total length is less than e/2 -f e/22 + • • • 4- 
e/2^ < e. Hence, p{0 — • F) < e. 

(6) Now we still suppose that p{0) < 4-oo but we allow O to 
have infinitely many components. We write O in the form B, 
where consists of a finite number of the components of O and p{B) < 
e/2. Then by part {a) we can construct a closed subset F of 4 such that 
p{A - F) < e/2. Thus, p{0 - F) = p{B u {A - F)) = p{B) 4* 
p{A ~ F)< e/2 4 e/2 = e. 

(c) Now suppose that p{0) = 4- oo (so that O is certainly unbounded 
in at least one direction). Let = O n («, n + 1), n = 0, ± 1 , ±2, .... 
For each n construct a closed subset F^ of such that |W(0„ — F„) < 
2~*”U/3; this is certainly possible, by parts (a) and (b). Now, even 
though the union of infinitely many closed sets is not always closed, 
nevertheless Ur— closed, (Why?) Observe that O^F = 

i^n — -^n)} ^ (set of integers belonging to O), and so p{0 — F) 
MOn - Fn)} + of all integers) = MO„ - F„) < 

= €, and so the proof is complete. 

Theorem 6: If A is measurable and € 0 is given, there exists a 

closed subset F of A such that A — F can be covered by an open set of 
measure <€. 

Proof: Referring to the definition of measurability, form open sets 
O and 0 such that A ^ O, O — A ^ 6, p{d) < e/2. Next, construct a 
closed set G such that G Q O and p(0 — G) < e/2. Let F = G — 6, 
Then F is closed, F ^ A, and A — F 0 ^ {O — G), (These two 
inclusions should be checked carefully.) Clearly, <5 u (G -- G) is open, 



3. HEASUHE or MOill GCNEIIAL SETS 3f 


and its measure is + /i(0 — G), which in turn is less than c. 

Thus, the proof is complete. 

Remark: Roughly speaking, Theorem 5 asserts that any open set 
can be almost exhausted by a closed set, while Theorem 6 asserts that any 
measurable set can be almost exhausted by a closed set. 

Theorem 7 : A is measurable iff is measurable. {In particular^ 
since all open sets are measurable, it now follows that all closed sets are 
measurable.) 

Proof: Suppose A is measurable. Given c > 0, there exists a 
closed set F such that F ^ A and A — F can be covered by an open set 
V of measure <€. Then F" 2 and == /t ~ F S V. Thus, A^ 

has been covered by an open set, P, with F"' — A^' contained in an open 
set, U, of measure <€. Hence A^ is measurable. 

Conversely, if A^' is measurable, we now know that (/f‘^V is measurable; 
but = A. 

Theorem 8; If A^, A^. .. . are each measurable ^ so is Pi Af^. 

Proof: A^, A^, . . . are each measurable, by Theorem 7, and, by 
Theorem 4, yt J u u /IJ u ♦ • • is measurable. By Theorem 7, 

is also measurable — but this is H A^,. 

Theorem 9: If A and B are measurable, so is A — B. 

Proof: B^ is measurable by Theorem 7, so /i n B^ is measurable 
by Theorem 8 ; but A n B'^ A — B. 

Theorem 10: If A and B are measurable and disjoint, p(A u B) = 
-I- (By Theorem 4 we know that A ^ B is measurable.) 

Proof: If either p{A) or p{B) is -f oo there is nothing to prove, so 
we assume that p{A) and p{B) are both finite. Given c > 0, find sets 
and Ox such that A c Ox, Ox ^ A £ Ox, p(di) < €. ^Since — .<4 is 
certainly measurable (by Theorem 9), p(Oi — A) < p(Ox) < c. Call this 
set (i.e.. Ox - A) Rx. Thus, Ox^ A ^ /?i, where p(Ri) < e, and so 
p{A) < p{Ox) < p(A) + p(Ri) < f^(A) 4- €. Similarly, we can find a 



40 LEB6SGUE MEASURE AND INTEGRATION 


set O 2 such that B ^ and fi(Oz — B) < €y and so fi(B) < juiO^) < 
jiA(B) -f c. But, by Theorem 2-4, /^(Oi u O 2 ) + ft(Oi O 2 ) — — 

0. On the other hand, /a(Oi u O 2 ) + fi(Oi O 2 ) ~ /^{oS — 
= /i((/4 u B)u (R^ u /J^)) + /i((v4 nB)^ (A n R^) u (Bn R^kj 
(R i n R^)) — /i((9i) — /i((72) < P>(A B) + -f f^(Rz) + + 

/i(jRi) -f //.(Ri) - /i(y4) - fi(B) < 5c + {)u(.4 u B) — - //(B)}, and 

so //(yl u B) — //(/4) — /<(B) > —Sc. Since c can be chosen arbitrarily 
close to zero, we obtain fA(A u B) — /jl(A) — //(B) > 0, or fi(A) + //(B) < 
//(y4 u B). On the other hand, we can find open sets Ui and U 2 such that 
A ^ Uiy B ^ 1 / 2 , and //(/7i) < fi(A) + c, //(t/ 2 ) < /^(^) + Since 
C/j u (72 is an open set containing ^4 u B, //(>4 u B) < //(t/j u C/g) < 
//(t^i) + //(t/ 2 ) < /^(^) + € + //(B) + € = ili(A) + //(B) 4- 2c. Since c is 
arbitrarily small, fi(A u B) < //(y4) + //(^)- Combining this with an 
earlier result obtained during this proof, we obtain //(y4 u B) = fi(A) 4 
//(B). 

Remark: Note where the disjointness of A and B is used. As in 
Theorem 2-4, the details may appear bewildering, but the essential idea 
is very simple. 

Theorem II: If y4i, y^g**** measurable and pairwise 

disjoint, then p(Ai u m(^i) + /^(^z) 4 • * * . (The measur- 

ability of Ai^ A 2 '^ • * ' is known from Theorem 4.) 

Proof: By Theorem 4 we already know that the left side cannot 
exceed the right side. It therefore suffices to prove the reverse inequality. 
From Theorem 10 and finite induction we know that, for 1< /: < 400 , 

VJ • • • u y4„) = + • • • + li{A„). 

Since Ui^li 2 U»_i we know that 

HiAy^ u u • • •) > + • • • + 

(Note that the left side has nothing to do with «.) Letting n increase 
without bound, we obtain the desired inequality, namely, 

00 

^^{A^ 'w' /4g u •••) > 2 

fc=l 

Theorem 12: If B ^ A, if both sets are measurable, and if p(A) is 
finite, then p(A — B) == p(A) — //(B). 

Proof: By Theorem 9, ^4 — B is measurable. We may write 
A = B^ (A — B), so by Theorem 10 we are sure that p(A) = //(B) 4- 
p(A — B), and so p(A — B) = p(A) — //(B). (Note carefully that the 
last step is invalid if ^(A) = 4- oo.) 



3. MCASURi OIF MORE GENERAL SETS 41 


Theorem 13: Jf ^ and if each is measurable, then their 

union is measurable, and ju({JZ i AJ == lim„ 


Proof: (a) It follows directly from Theorem 4 that the union is 
measurable. 

(b) Since A^ ^ /i{/< J < and so lim„^^, fi{A^) must 

exist, either as a finite number or -f oo, 

(c) If any particular has infinite measure, say Aj^, then clearly 

fi{AJ = -f 00 whenever m > k, and then A^) must clearly be 

-hoo, which is also lim,,^^ 1^{A^. 

(d) Therefore, we may assume that each A„ has finite measure. 
Then by Theorem 1 1 we obtain 


/^( - Ai) + /u(/43 - Ai)+ ■■■ 

Taking any partial sum of the infinite series on the right side and using 
Theorem 12, we obtain /niA^) -H fiiA.^ — /li) -f • • • -f i) ~ 

MAi) 4- MAz) - + f^Az) - f^iAz) 4 • • • -f fi(AJ - fi{A^.,i) = 

fi(AJ, Thus, the sum of the infinite series must be lim„ ..,^, /^(A„), and so 
we have the desired result. 

Theorem 14; If A ^ if all the A\s are measurable, and if 

lu(Ai) is finite, then fx{A^ n ^2 ^ • • •) = linin-ar; /^(>4„). 

Proof: (a) Since the sets are nested and f^iAi) is finite, each A^ has 
finite measure and these measures form a monotone non-increasing 
sequence of real non-negative numbers, and so the limit of /AA^) exists. 
Also, by Theorems , Ain A^r^ ' — \s measurable. Now, observe that Ai 
can be expressed as a disjoint union in the following way: 

Ai = I n u {Ai — A^ ^ (A^ -- /Is) u • • ' . 

By Theorem 1 1 , we obtain 


M^i) = 



00 


*+■ 2 ^k+i) 

fc-1 





n 

+ Jim 2 

n-*ao *'=^1 



42 


LEBESGUE MEASURE AND INTEGRATION 


and by Theorem 12 we then obtain 

n /Ifc) "f lim + M^a) 

\k^l ] n-*co 

- fx(A^) + • • * + M^n) ~ 

= //( n + M^l) - /i(/l J. 

\fc:=:=l / n-»00 

Cancelling out f^(Ai) from both sides, we obtain 

/^(n 

\k^l / n-^oo 

Remark: Note that the finiteness of /i(Ai) was needed in order to 
use Theorem 12. However, one might ask whether some other argument 
might give the desired result without assuming the finiteness of fi{A^. The 
answer is NO\ (Why?) (Actually, the theorem clearly holds if the first 
million A's have infinite measure, provided that for some index k the set 
Ajf^ has finite measure. Of course, all the following A\ will then also have 
finite measure.) 

Definition 2: A null-set {not to be confused with the empty set) is a 
set of measure zero. {From the definition of measurability and from the 
definition of measure it is easily seen that A is a null-set iff for every € > 0 
there exists an open set O of measure <€ such that A ^ O. Clearly, any 
subset of a null-set is both measurable and a null-set. Also, by Theorem 4, 
any finite or countably infinite union of null-sets is a null-set.) 

Definition 3: If a particular set A is under consideration and if 
something happens at all points of A with the exception of a null-set, we say 
that this happens almost everywhere on A. For example, if a sequence of 
functions diverges on a certain null- set B ^ A and converges everywhere in 
A — B, we say that the sequence converges almost everywhere in A or at 
almost all points of A. The abbreviation a.e. is often used, and occasionally 
the French abbreviation p.p. {presque partout) is employed. 

Definition 4; Two sets E and F are said to be equivalent if E ^F is 
a null-set, where E AF = (£ — F) u (F — E). Of course, this use of the 
word equivalent is entirely different from its use in set theory, where it 
means that a one-to-one correspondence exists between two given sets. 

Remark: If E and F are equivalent, then F can be obtained by 
removing a null-set from E and then adding a null-set to the resulting set. 
Of course, E and F can be interchanged in this statement. It therefore 



3. MEASURE OF MORE GENERAL SETS 43 


follows without difficulty that this is indeed an equivalence relation. Note 
that we do nol assume that E and Fare measurable in this definition. 

Theorem 1 5 ; If Band Fare equivalent and if either one is measurable, 
so is the other, and their measures are equal. 

Proof: Since E — F and F — E are subsets of £AF, it follows 
from the definition that they are both null-sets. Suppose E is measurable; 
then E\J{F-E) is measurable; but E (F - E) = E^ F. and so 
£ u F is measurable. But F = (£u F) - (£ - F), so F is measurable 
by Theorem 9. Now. //(£ u F) = fi{E) + /,(F - £) = fi(E). while /<(F) = 
hIEkj F) - fi{E - F) = fi(E u F). Hence, n(E) = /<(F), (Note that this 
argument works even when /li(E sj F) = + oo.) 

Exercises 

1. (a) Show that a set A is a null-set ilT for every positive number c 

it is possible to find a countably infinite collection {O,, 

O 3 , . . .} of open intervals such that j < € and each 
point of A belongs to at least one of the O's. 

(/?) Show that the preceding result remains correct if “at least one“ 
is replaced by “infinitely many/' 

Note that this exercise provides a definition of a null-set 
which does not presuppose a knowledge of the concept of 
measure, or even of a measurable set. 

2. (a) Prove that measurability and measure are both translation- 

invariant; that is, if A is measurable and if B is obtained by 
adding to each member of A a fixed number, then B is measur- 
able and /n(B) = iu(A). 

(/?) Suppose that a non-negative number v(A) is associated with each 
measurable set, that v(A) is translation-invariant, and that it 
possesses the countable additivity property (Theorem 1 1). 
Prove that v{A) — ciu{A) for some fixed non-negative number 
c*. 

3. (a) Let = [0, 1) and let two members of A be called equivalent 

if their difference is rational. Prove that this definition 
determines an equivalence relation on A. 

(b) Let A be decomposed into the equivalence classes determined 
by the equivalence relation defined in (a), and let one member 
be chosen from each equivalence class. Show that the set S 
formed in this manner is not measurable. Hint: Suppose 
that S is measurable. For each rational number r in A 
(including zero) let be the set obtained by adding r to each 
member of S. By using a slight extension of part (a) of the 
preceding exercise, show that \J S, must have measure equal 



44 LEBESGUE MEASURE AND INTEGRATION 


to one, but that this leads to a contradiction of the assumption 
ju(S) = 0 and also of the assumption /i(S) > 0. 

Thus, the existence of an unmeasurable set has been 
demonstrated, 

4. A is a subset of R which can be expressed as the intersection of 
a countably infinite collection of open sets, while an is a 
subset of R which can be expressed as the union of a countably 
infinite collection of closed sets. Show that every measurable 
set can be expressed as the difference of a and a null-set and 
also as the disjoint union of an F^ and a null-set. 


§4. MEASURABLE FUNCTIONS 

In the two preceding sections we have developed the essential parts 
of the theory of measure. In this section we develop the theory of measur- 
able functions, which forms the link between measure and integration. 

Definition I : A real-valued function f is said to be measurable if for 
every real number a the set {x | /(^) < a} {which can also be written 
00 , a)) is measurable. 

Theorem I : If f is measurable, its domain E is measurable. 

Proof: Clearly, E = UT-i thus £ is a union of 
countably many measurable sets, and is therefore measurable, by 
Theorem 3-4. 

Theorem 2: If f satisfies any one of the following four conditions, 
it satisfies the other three, and so any one of the four may be used in the 
definition of measurability: 

(a) ^)) measurable for every real number a; 

{b) 00 , a]) is measurable for every real number a; 

(c) /~^((a, •+■ oo)) is measurable for every real number a; 

(d) /~^([a, -b oo)) is measurable for every real number ol. 

Proof: Suppose (a) is true. Clearly f~\{— oo, a]) can be expressed 
as nr»i a + !/'*))» so by Theorem 3-8 it follows that 
/“^((— oo,a]) is measurable. Next (still assuming that (a) is true), we 
see that +oo)) = £ — /“^((— oo, a]), and so +Qo)) is the 
difference of two measurable sets. Next, we observe that + ^)) = 
£ — /’"^((—oo, a)). Thus, +oo)) is measurable if (^i) is true. 



4. MEASURABLE FUNCTIOMS « 


Hence, (a) guarantees (b), (c), and (,/). Similarly, each of (A), (c), or (d) 
guarantees the three remaining conditions. 

Definition 2: Two functions j and g are said to be etjuivalent if their 
domains are equivalent and if they agree almost everywhere (on the inter- 
section of their respective domains). 

Theorem 3: If f and g are equivalent and if one of them is 
measurable , so is the other. 

Proof: Trivial. 

Theorem 4: Ij f and g are both measurable (and are defined on the 
same domain) and iJ c is any real number, then the following functions arc 
also measurable: 

(") cfi (/?)/+ (c)/- g, (d)f \ (e)fg. 

Proof: (a) Trivial if c = 0. If c > 0, 

{a- I cf(x) < a} = {a* I /(a) < a/fK 
and so the left side is measurable. If c < 0, 

{a- I cfi.x) < a) = {x |/(a) > 

and so, again, the left side is measurable. 

(b) Suppose that, for some .v,/(a) -f g( v) < a. Then it is possible 
to find rational numbers j and t such that f(x) < .y, ,f^(A;) < /, and s 4- 
t < a. (This is so because the rationals are dense in R.) Conversely, if 
f(x) <s and g(x) < t and s t < ol, then, clearly, fix) 4- ;^(.v) < a. 
Thus, 

{x I fix) + g(x) < a} = U {x I fix) < r'(n {.v | g(x) < x}. 

rational 

ri.v<i7 

Since f and g are measurable, each { } appearing in the preceding union 
is measurable, hence each set { ) { j is measurable, and finally the 

entire right side, being a union of countably many measurable sets, is 
measurable. Hence,/ 4 ^ is measurable. 

(c) By (a), — g is measurable, and by (h) we see that/ 4 (~^) is 
measurable. 

id) For a < 0, the set {x \Pix) < «} is empty, hence measurable. 
For a > 0, the set {x |/®(x) < a} can be written as 

{x |/(x) < Va} {x |/(-v) > 



44 LEBESGUE MEASURE AND INTEGRATION 


Since each { } is measurable (see Theorem 2), their intersection is 
measurable, and so P is measurable. 

(e) Note thaty^ = \{{f + ^)^ — (/ — gY) and observe, with the aid 
of (a)^ {b), (c), and (J), that the right side is measurable. 

Theorem 5: Let {/,,} he a finite or countably infinite collection of 
measurable functions, all defined on a fixed domain. If this collection is 
infinite, we assume that the set of numbers (/,X.v)} is hounded for each x 
{but not necessarily uniformly bounded on the entire domain). Then the 
functions sup {f.^} and inf {/„} are also measurable. 

Proof: It obviously suffices to discuss sup {/„}. Call this function /. 
Then {x |/(:r) > a} = {x \fi{x) > a) u {x |/ 2 (.v) > a} u • • • . Each of 
the sets { } on the right is measurable, and, since only a finite or countably 
infinite number of sets are involved, their union is measurable. Hence,/ 
is measurable. 

Corollary: If f is measurable, so is \ f\. 

Proof; Observe that |/| = sup {/, — /} and use Theorems 4 and 5. 

The concept of limit superior (lim sup, lim) and limit inferior (lim inf, 
lim ) should be known to the reader; a brief presentation of this topic is 
given in Appendix G, and it should be studied carefully at this time if the 
reader is not thoroughly acquainted with it. 

Theorem 6: Let {/„} be a sequence of measurable functions (all 
defined on the same domain) and suppose that for each x the sequence 
{/nW} bounded {above and below). Then the functions lim /, and 
lim P are also measurable. 

Proof: Let^i = sup {Pfi . . • .},^2 == sup {/>,/ 3 , . . and so forth. 
By Theorem 5, measurable, and, again by Theorem 5, 

we see that infg-;^ is measurable. Now (cf. Appendix G), inf^i^^ = lim 
Thus lim /„ is measurable, and, similarly, so is hm /*. 

Corollary : If the functions Pfi^ . . . are all measurable and converge 
{to a finite limit) everyw here {on the common domain of all the ffs), the 
limit function is also measurable. 

Proof: If the given sequence converges everywhere, then the limit 
coincides both with the lim and the lim, each of which is measurable by 
Theorem 6. 



4. MEASURABLE FUNCTIONS 47 


Theorem 7: If f is measurable and if A is a measurable subset oj 
the domain of /, then fjA (the restriction of f to /<) is also measurable. 

Proof: Obvious. 


We now indicate briefly how the foregoing development must be 
modified in order to lake account of functions which assume infinite 
values. Such a modification is necessary, in particular, in order to be 
able to assign an upper limit and a lower limit to every sequence of real 
numbers and to every sequence of real-valued functions. 


Definition 3: The extended real number system, denoted 
consists of R together w ith the new symbols, -f a- and — a), with the obvious 
ordering (i.e., for any real number a, — 'X> <a < -fa) and with the 
obvious {or almost obvious) arithmetic: For any real number a, a -f (±^X)) = 
±oo; n- (±x>) = ±00 //^? > 0,f7 ■ (± x) = =Fcx.//^/ < 0,a- (±x) 0 

ifa^Q\ (±ooj-(±cx)) = ±X),(±x;)'(-a/)== -x; (±x)±(-“Xj) 

is not defined. The topology (i.e., the collection of open sets) assigned to R* 
is a rather obvious extension of the topology of R ( w hich is detet mined by 
the metric w hich we have defined on R in Chapter 1 ) All sets of the form 
±a)] and all sets of the form [— are declared to he open {for 
every real number a), and a subset A of R* is declared to he open iff A can 
he expressed as a union [not nece.ssarUy countable) U If, where each B is a 
finite intersection of the particular open sets described at the beginning of 
this sentence.^ (Note carefully that every open subset of R is also an open 
subset of R*, hut this is not true for closed subsets; for example, R is a 
closed subset of R, hut not of R*.) 


Definition 4: A function defined on R, or on a subset of R, with 
values in /?*, is called measurable if the inver.se image of every set of the 
form (a, ± oo] and the inverse image of every set of the form [ x, a) ate 
measurable (for every real number a). {Note carefully that the domain of 
the function is still restricted to R; only the allowed range has berm enargt^ 
Also, there should he no difficulty in seeing that when 

the values ±x or ^oo the two definitions of measurability are 

consistent.) 

Remark: All the previous results about 
(Theorems 2 to 7) continue to remain valid, 

(r) of Theorem 4 we must restnet we shall see that 

at which we have (± oo) — (± oo) or (± ) working in i?* 

this is a small price to pay for the advantages gamed by working R • 

fThe reader acquainted with ^ ln°afc« -^ubbase of 
collection of all sets of the form (a, +a.l or l «t 
the topology of R*. 



48 LEBESGUE MEASURE AND INTEGRATION 


Exercises 

1. Given any subset E of R, characteristic function of E, is 

the function defined as follows: XiA^) ~ ^ ^ ^ 

if .V ^ E. Show that a set E is measurable iff Xk is measurable. 

2. Let E be any measurable subset of a finite interval A. Given any 

positive number e, show that there exists a subset E o{ A 
consisting of a finite collection of intervals such that fi{E A£') < 
€. 

3. Let E be any measurable subset of a finite interval A. Given any 

positive number €, show that there exists a continuous 
function / which satisfies the conditions 0 </(a) < 1 every- 
where on A and which coincides with Xe except on a set of 
measure less than €. (Hint: Use part {h) of Exercise 9-5 of 
Chapter 1.) 

4. A simple function is one whose range consists of a finite set of 

real numbers. A step-function is a function which can be 
expressed as a finite linear combination of characteristic 
functions of intervals. (Note that a step-function must be 
simple.) Show that if / is any measurable simple function 
detined on a hnite interval A and if any positive number e is 
given, there exist a step-function g and a continuous function 
/; such that ^ and h both agree with f everywhere on A except 
for a subset of measure less than €. Furthermore, one may 
impose on g and h the restriction that their ranges should lie 
in the interval [min /, max/]. 

5. Let the sequence of measurable functions fi,f 2 ,fi each 

defined on a set A of finite measure, converge everywhere in 
A to the function f (All functions are assumed to take 
values in /?, not in /?*.) Given any positive numbers d and e, 
show that there exists an index N such that the inequality 

\fix) -/v(.v)l < ^ 

holds everywhere in A except on a subset of measure less than 

6. (Egoroff’s Theorem): Prove the following strengthened version 

of the preceding exercise: Under the same hypotheses as in 
Exercise/, for any positive number 6 there exists a subset B 
of A such that p(B) < h and the given sequence of functions 
converges uniformly on A — B. 

7. Prove that the condition that pi{A) be finite cannot be omitted in 

either of the two preceding exercises. 



5. INTEGRATION OF NON*NEGATIV£ FUNCTIONS 4f 


8. (Lusin s Theorem): Let f be bounded and measurable on a 
finite interval A and let a positive number 6 be given. Show 
that there exists a continuous function g defined on A such that 
/ and g agree everywhere except on a set of measure less than 
d. Furthermore, one may impose on g the condition that its 
range should be contained in the interval [inf/, sup /]. 
Hint: Approximate /by a sequence of simple functions, then 
approximate the simple functions by continuous functions, 
and finally invoke Egoroflf’s theorem. (Actually, the con- 
ditions that / be bounded and that /i(A) be finite can be 
omitted.) 

Remark: Egoroff’s theorem can be stated, imprecisely but vividly, 
in the form : Every convergent sequence of measurable functions is almost 
uniformly convergent. Similarly, Lusin's theorem asserts that every 
measurable function is almost continuous, while Definition 3-1 character- 
izes measurable sets as those which are almost open. These three approxi- 
mate truths constitute the “Three Principles'’ of Littlewood, who 
emphasizes that they play a major role in the development of the theory 
of functions of a real variable. 


§5. INTEGRATION OF NON-NEGATIVE FUNCTIONS 

We are now in position to undertake the development of the Lebesgue 
integral. Always keep in mind that, since we allow sets of infinite measure, 
expressions of the form (a) - (-f oo) may be encountered, where a is a 
(finite) real number. In many treatments of the subject one begins with 
functions defined on sets of finite measure and later removes this restric- 
tion, but we believe it preferable to admit sets of infinite measure from the 
very beginning. Also, recall that functions may assume values in — i.e., 
+ 00 and — oo are permitted functional values. 

Definition I : A simple function is a function whose domain is 
measurable and whose range consists of a finite number of finite non- 
negative numbers. {Note that 4- oo is not allowed as a value of the function,) 

!t should be remarked that in some books the restriction to non- 
negative values is not imposed. (Actually, the present definition does not 
agree with that given in Theorem 4-4; the change has been made for 
convenience, and no confusion should result.) 

Theorem I : Let f be a non-negative measurable function, defined 
on a set A {which must be measurable, by Theorem 4-1). There exists a 



so LEBESGUE MEASUBE AND INTEGRATION 


sequence Si{x), 5 a(x ), of measurable simple functions {all defined on A) 
such that: 


(a) 0 < ‘ everywhere on A, 

{b) -> f{x) everywhere on A. 


Proof: For each positive integer n, form the set and the sets 
£„ l, • • • > ^n.n-2" ^S folloWS: 


^’n = {^ I f(x) > n}. 


Fn.h — 



</<-)< !)■ 


(Clearly, A is the disjoint union of and the sets and all these sets 
are measurable.) Now, let 


Snix) = 



if X 6 F„, 
if X G 


The sequence of functions Ji, ^3, . . . clearly satisfy conditions {a) and 

{by 


Definition 2: Let s be a measurable simple function with domain A; 
let the distinct values of s be oli, a 2 , . . . , ol^.; and let the sets be 

denoted Aj^, so that A is the disjoint union of the A^js and the A^"s are 
measurable. We define the integral of s over A as follows: 

[ 5=2 

JA Jfc-1 

IfB is any measurable subset of A,, we define J/j /o be 5 , where s is the 
restriction of s to B. {Clearly s is also simple and measurable, so this 
definition makes sense.) 

Theorenn 2: 

I s = n A^. 

JB Jfe-1 

Proof: Trivial. 

Theorenn 3; Let s be a simple measurable function defined on A 
and let A be the disjoint union of the measurable sets B^, B^y . • {where we 
allow either a finite or countably infinite number of B"s). Then 



5. INTEGRATION OF NON-NEGATIVE FUNCTIONS SI 


{If there are infinitely many B s the convergence of the sum on the right is 
part of the theorem.) 

Proof: Trivial. 

Theorem 4: // .Vj and s 2 ^ are measurable simple functions defined on 

A and if s^ > .^2 everywhere on A, then f ^ s^ > j‘ , s^. 

Proof: {a) Obvious from the definition if.Vi and .V 2 are both constant 
functions, or even if one of them is constant. 

(/?) If neither s^ nor s^ is constant, split A into disjoint sets B ^, . . . 
(finitely many) on which both s^ and are constant. (It is obvious how 
to do this.) Then, using Theorem 3 and part (a) of the present proof, we 
obtain 



which is the desired result. 

Definition 3: Let f be any non-negative measurable function defined 
on A. {Note that we allow as a value off) Then ne define 



where s ranges over all measurable simple functions satisfying s < f every- 
where on A. If B is a measurable subset of A, we define 



Of course, J'^ / is called the integral of f over A. 

Theorem 5: The two preceding definitions are consistent: /.e., ij f 
is simple, these two definitions assign the same value to J ^ /. 

Proof: {a) If /is simple, it is one of the functions s appearing in the 
second definition; hence, f (according to second definition) > f.f f 
(according to first definition). 

{b) If/ is simple and 5 < / everywhere on A, then, by Theorern 4 
Hence, / (according to second definition) < / 

(according to first definition). 

(c) Now we merely combine {a) and {b). 



52 LEBESGUE MEASURE AND INTEGRATION 


Theorem 6: Assuming that all sets involved are measurable and 
that all functions are measurable and non^negative: 

io) Jff < g everywhere, then Jk / < J*. g. 

{b) If B, then /< J^/. 

(c) If c is a non-negative constant (even +oo allowed), 

\cf=ci f. 

JE JE 

(d) If f ^ 0 on E, then f If = + oo. 

(e) If iu(E) = 0, then / = 0, even iff= 4- oo everywhere on E. 

(/) / = J/e fXE- iCf Exercise 4-1 .) 

Proof: Trivial. Note that (/) shows that any integral may be 
thought of as an integral over R, 

Theorem 7: If s and t are simple functions, then -f 0 = 
Jjgi* + (From now on we avoid stating specifically the hypotheses 

that the functions and the sets with which we are dealing are measurable.) 

Proof: Practically a carbon copy of proof of Theorem 4. (Note 
that the sum of simple functions is also simple.) 

Theorem 8 (Monotone Convergence Theorem): T/'O < /j < 
/*<•••, anrf /// = then (Note that f 

is meaningful, and that it is measurable [by the corollary to Theorem 4-6] ; 
also, by (a) of Theorem 6, J ^ /i < /2 < * ' ’ » ^^d so f and lim^j^^^^ 

Sa fn exist. The only problem is to show the equality.) 

Proof: (a) By (a) of Theorem 6, f for all indices k. 

Hence, J^/> hnifc-oo and so it remains only to prove the reverse 

inequality, fy.. For convenience let us denote lim;^.^^,^ 

fk by the symbol a. (Incidentally, if a = -f oo there is nothing more to 
do, since > + oo assures that / = 4- oo. Thus, we may assume, if 
we wish, that a < 4 “ 00 , but the following argument even covers the case 
a = 4 * 00 .) 

(b) Choose any positive number c less than 1 and any measurable 
simple function s which is everywhere < f. (Remember that s'^0 every- 
where.) Let = {x\fn(x)'^ cs(x)}. The A^"s are measurable and 
= y4i u >42 vj * • • . (Why?) Using (a), (b), and (c) of Theorem 6, we 
obtain /„ > /„ > (cs) = c 5, and so, letting n oo, we 

obtain (noting that s < s) 



5. INTEGRATION OF NON-NEGATIVE FUNCTIONS 53 


If ^ = constant, -s — (constant value of 5)/i(i4„) (constant value 
of s)/^(A) (by Theorem 3-13) = J^,v; if s is nor constant, a splitting 
argument similar to the one used in Theorem 4 shows that it is still true 
that s £4 s. Thus, a > c s. Since s is any simple function </, 
by taking account of the definition of fj f we obtain a > r /. Now, 
letting c —> 1, we obtain a > f ,j,/, and so the proof is complete. 

Theorem 9: If fandg are non-negative, (/ -f j?) « f -f 

{Note that f g is measurable by Theorem 4-4; the (+00) *f (~oo) 
difficulty cannot arise in this case.) 

Proof: As in Theorem 1, construct a monotone sequence of simple 
functions Si, S 2 , . . . converging to /and a sequence r^, • • • converging 

to g. Then the sequence (.Sj -1- /j), {s^ + t^), . . . converges monotoncly 
Xof-\-g. By Theorems 7 and 8 we obtain 


f (/ + g) = lim f (s„ 4 tj 

Ja n~*ix''jA 

= lim I x„ + 

= lim I x„ + lim | <„ = / + j g- 

'n~*cc Ja Ja Ja. Ja 


Theorem 10; .// the functions f 2 , . ^ > are non-negative, then 

Sa 2r?-x fn = In^i L U (By Theorem 4 -4 and induction, /« « 
measurable, and then by the corollary to Theorem 4 -6 the infinite sum 
fn measurable.) 

Proof: Let/W = fJx) and let gjx) = ^i‘„i AW- Clearly. 
— >•/ (jc) everywhere and 0 < < ^2 ^ ' ' ' • Theorems 8 and 

9 and finite induction, 



and this is precisely the assertion of the theorem. 

Theorem il (Patou’s Lemma): If fi, U ■ ■ ■ are non-negative 
functions, then lim fi < lim jj ft- 


Proof: Let g, = inf = inf {/../». • • and so forth. 

Clearly, gt<f^, and so Hence, lim < iE /»• 

However, since < ^2 < ?s < ' ’ ’ » li!!! hm,™* ft, 



54 LEBESGUE MEASURE AND INTEGRATION 

and this, according to Theorem 8, is the same as Jj {V\mg},). Hence, 
Li ^ ^^rom the definition of fim, we see that lim;^;^ = 

lim/fc, and so the last inequality furnishes the desired result. 

Exercise 

1. Let be a sequence of measurable subsets of /? and suppose 
that ‘S finite. Prove that almost all points of R 

belong to only finitely many 


§6. INTEGRATION OF REAL-VALUED 
AND COMPLEX-VALUED FUNCTIONS 

We now proceed to eliminate the restriction, which was imposed 
throughout the preceding section, that the functions involved may assume 
only non-negative values. First, we consider functions assuming values 
in 

Definition I: /' = max (/^ 0), /" = max(—/, 0). (/Vote that /' 
is a non-negative function, even though f is often called the negative part 
of f) Obviously f ^ f'^ — ■ /~. Note that if f is measurable, so aref^ and f~. 

Definition 2: Suppose that f is measurable and does not assume 
either of the values ±oo. {We shall later see that this restriction can be 
removed, but at this point it would be slightly bothersome to do so.) If 
J/^ f / are both finite, we define J f to he f — J / , and we say 
that f is integrable, or summahle {over the set on which we are integrating). 
Note that, iff^O everywhere, f = 0, and the new definition of ^ f agrees 
with the original definition. 

Remark: Note that if / is integrable, so is |/|; in fact, J|/l = 
J .f / • marked contrast to improper Riemann integration ; 

for example, in the Riemann theory we assign the value 77/2 to ' sin xfx, 
since lim^_^^ ^ sin xjx exists and equals Trjl. Since |sin .v/a'| = -f 00 
(i.e,, sin x/x is not summable), the statement sin x/x = njl is false in 
the Lebesgue theory. (Cf. Exercise HA.) (Since integrals of this type 
play an important role in Fourier analysis, we see that the Lebesgue 
theory is not quite a panacea!) 

Theorem I : Let f and g be summahle. Then {a) of is summable for 
any real constant a, and f (a/) = a J /; (/?) f g is summable, and 
•f (/ + ^) = J / + .f generally, J’ (a/ + /^^) = oc .f / + /? J g.) 


Proof: (a) Trivial. 



6. REAL-VALUED AND COMPLEX-VALUED FUNCTIONS 


I 

(h) f+g is measurable. Since |/> .?( < |/| + |j,|. j, foUows that 
(/ + .?) ‘‘re non-negative and < |/| 4. |^|, Thus by 
earlier results, (/-|- ^sf)' < I' (|/ | -|- |.|) = f ] /| 4. j |^| < 4. a., and sim- 
ilarly for (/ -f,^) . Thus,/4- _^ is siimmah/e. Let h — f + g and form 


and h . 

(Note carefully that h‘ = / 

' + .?' 

is not always true!!) Then 


/T - /, =f ' -f 

+ g‘ 


and so 

+ f +g =li 

+r 


In this equation everything in sight is 

^0. so 

by earlier results we may 

integrate term-by-term, obtaining 

J'" 


f' +[«'. 


and then we may transpose (since all quantities are finite!!): 


or 

or 

j(/ +«•=//+/«. 

and the proof is complete. 

Theorem 2: Iffis summahle, | | /| < j |./|. 

Proof: Since /+ and / are non-negative, their integrals are >0, 
and so 

-|i/i - -jif +n - -//• -jr -//- 
< Jr +Jr =J(/"+r)=Ji/i- 

This is equivalent (disregarding the intermediate terms) to |j,/| < ,| I/I, 
which is just the assertion of theorem. 

Theorem 3 (Lebesgue Dominated Convergence Theorem): 

Suppose that the functions measurable, that |/»(.v)l g{x) 



56 LEBESGUE MEASURE AND INTEGRATION 


everywhere, y;(x) hein^ summahle, am! suppose that f^X-x) / (•') 

where. Then (a) f is summahle; {h) |’ \f — fj -■> 0; (c) ('/„ exists; 

and (d) the limit appearing in (c) coincides w ith jj f—that is, J lim/,, = 

lim j'/n- 

Proof: (a) By earlier results, /is measurable. Since each |/fr(.x')| is 
^g{x), it follows that |/(.\')| < ,tf(-v), and so / ^ < g and / < g every- 
where. Thus, 0 < J / ^ < -foo (by (a) of Theorem 5-6), and so / 

is summable. 

(h) The functions 2g — |/^ —/I are measurable and non-negative 
and converge to Ig; in particular, (2^ — — /I} = 2g, and by 

Fatoif s lemma 


hg < [im r{2g - |/„ -/!}. 

J It-* yj J 

Now, |y„— /I is summable, since it is measurable, non-negative, and 
<2^ (use (a) of Theorem 5-6), and so, by Theorem 1, 

J{2S- |/„-/|} =j2g-J|/„-/|. 

Since 2g is independent of the index n, 

lim f{2g - |/„ -/|} = lim { - f \fn -/l| 

?i-*rf.ij V -* (tj \J J ) 

= j2g - ITm J ]f„ -f\. 

(Note carefully the change from Hm to lim, because of the minus sign!) 
Thus, J 2^ < J 2g - J \f„ -/|; since | 2g is finite, we may 

cancel and transpose, obtaining 

fl/n-ZKO. 

n~»QO J 

But the quantities J |/„ — f\ are >0 by their very nature, and so 
iilQri-^oo J 1/n ~/l > 0 > lim„^^, J |/„ ~/l. Since lim < lim, we must 
have Irm = 0 = lim = lim. Thus, lim„_^ J |/„ —/I = 0, which is (/?). 

(c) and (d): (b) guarantees that for every positive e there exists N{€) 
such that whenever n > N(€) the inequality J |/~/„| < ^ holds. By 
Theorem 2, |J (/ — /„)| < e, and by Theorem 1 |J/~ J /J < €. But 
this says that J/„ J/, which contains both (c) and (d). 



T. INTEGRATION OVER PLANE SETS ST 


It should be entirely evident now that summability, value of an 
integral, and so forth are not affected by altering a function on a set of 
measure zero; we may therefore change a function on a null-set, or 
disregard on a null-set, whenever we find it convenient For example, in 
the Lebesgue dominated convergence theorem, we may replace the word 
^‘everywhere'' in ->/(x) everywhere" and “|/,(x)| < g{x) every- 

where" by “almost everywhere." In fact, we may speak of / even if / 
is defined only almost everywhere on A; we simply integrate over A 
minus a suitable null-set, or else we may extend the definition of / to all 
of A in any way we please. 

We conclude this section by indicating briefly how the entire theory of 
integration which we have developed is extended to complex-valued 
functions (defined on a subset of R), Let /be a complex-valued function 
defined on a set A, and let g and h be the real and imaginary parts of /, 
respectively. Then /is said to be measurable iffbothj^and h are measurable; 
similarly, /is said to be summable iffboth and h are summable, and when 
these conditions are satisfied the integral / is defined as f 4 g -f / J 4 h. 
The entire theory then extends, with obvious modifications, to the 
integration of complex-valued functions. 

Exercises 

1. Prove that a complex- valued function / is summable iff / is 

measurable and |/| is summable. 

2 . (a) If /is real-valued and summable, prove that |J/| = J |/| iff 

either /> 0 almost everywhere or / < 0 almost everywhere. 

(/?) If /is complex-valued and summable, prove that I f/I = J |/| iff 
there exists a constant c of unit modulus such that / = c |/| 
almost everywhere. 

3. If /is a summable function, either real-valued or complex-valued, 

defined on a set A, show that for every positive number c 
there exists a positive number such that, whenever is a 
subset of A whose measure is less than d, the inequality 
Ib i/I < ^ hence the inequality Ij/j /| < e) holds. (Of 
course, this is trivial if |/| is bounded.) 

§7. INTEGRATION OVER PLANE SETS 

In attempting to extend the theory of measure and integration from 
the real line, to the plane, R X /?, one encounters at the very beginning 
the difficulty that for open sets in the plane there is no canonical decom- 
position analogous to that which exists for open sets in as stated in §2 
and explained in Appendix £. It is simply not true, for example, that every 
open subset of the plane can be expressed as the union of a finite or 



S8 LEBESGUE MEASURE AND INTEGRATION 


countably infinite collection of disjoint open squares (or rectangles). 
This becomes evident as soon as one considers a circular disc or the 
domain formed by removing a square from one corner of a larger square. 
Fortunately, however, a suitable decomposition theorem does exist. 

Theorem I : Every non-empty open subset of the plane can be 
expressed as the union of a countably infinite collection of closed squares 
whose interiors are disjoint. Although this type of decomposition is not 
unique, the sum of the areas of the squares employed is the same for all 
decompositions. 

We shall merely sketch a proof of the first half of this theorem; for 
a complete proof of the entire theorem, and also of Theorem 2, the reader 
is referred to [Hartman-Mikusinski, Chapters 11 and 12]. A uniformly 
spaced mesh of horizontal and vertical lines is constructed, the spacing 
being sufficiently fine that at least one of the closed squares defined by the 
mesh is contained in the given open set O. To each such square the label 
“1” is attached. The mesh is then refined by constructing lines midway 
between each successive pair of lines of the original mesh. To those 
squares (if any exist) of the refined mesh which are contained in O but are 
not subsets of the squares already labeled, the label “2" is attached. As 
this procedure is repeated indefinitely, a countably infinite collection of 
closed squares is obtained whose interiors are disjoint and whose union 
is precisely the given open set O. In Figure 1 the first three stages of the 
decomposition are illustrated. 

The two parts of the preceding theorem, taken together, justify the 
assignment to every plane open set O of a number (possibly -f oo) /^(O), 
defined as the sum of the areas of the squares used in any decomposition 



Figure I. 










7. INTEGRATION OVER PLANE SETS IS 


of O of the type described in the statement of the theorem. It is not 
difficult to show that if O is a square {or rectangle), then /<(0) coincides 
with the area of O, (Cf. Exercise 1.) It is now possible to repeat the 
entire chain of arguments employed in the preceding sections of this 
chapter to obtain a satisffictory theory of measure and integration in the 
plane. The integral of a (summable) function /over a (measurable) set 
A will be denoted JJ ., / or JJ,i /(a, v) dx dy. 

The following question and its answer are of fundamental significance 
in many problems of analysis: If /is summable over the rectangle A, 
defined by the inequalities* a < x < h and c < y < / is the value of the 
integral fj’.| / equal to that of the iterated integral J*;! { j’J f(x,y)dx}dy 
and of the iterated integral {f‘;! /(a, y) dy} ^/a ? In fact, it is far from 
obvious that the existence of the plane integral guarantees the existence 
of either iterated integral. To take a rather trivial example, let C be a 
non-measurable subset of the interval (—1, 1), let A be the open square 
( — 1 , 1) X {— 1 , 1), and let /be defined in A as the characteristic function 
of the point-set C defined by the pair of conditions v e C\ v = 0. It is 
extremely important to note that, while C is a non-measurablc subset of /?, 
C is a measurable subset of R X R ~~\n fact, fi{C) - 0. Thus f/, /~ 0; 
also, j'ii f(x,y) dy = 0 for all a in (—1, 1), and hence 

[ j[^./ = 

On the other hand, j ftwy) dx = 0 if j # 0, but p , /(a, 0) dx does 
not exist (since C is not measurable). However, since |/j f(x, y)dx is 
defined for almost all values of y and is summable, the iterated integral 
J.li {jii /( a, y) ^/a} J}' exists and equals zero. This example, simple 
though it is, serves to indicate the difficulties wffiich must be overcome in 
establishing the following celebrated theorem. 

Theorem 2 (Fubini): Let f be summable over the rectangle 
A = {a, b) X (f, d). Then /(a, y) dx exists for almost all values of y in 
(c, i/), and the function defined by this integral is summable, so that the 
iterated integral {.f,? / (a, >') ^/a} dy exists; similarly, the iterated integral 
.fa {.f? y) exists. The plane integral Jjj / and the two iterated 

integrals all have the same value. 

Exercise 2 shows that the existence and equality of the two iterated 
integrals does not guarantee the existence of the plane integral, while 
Exercise 3 shows that it is possible that only one of the two iterated 
integrals may exist. Finally, Exercise 4 shows that both iterated integrals 
may exist and yet have different values. 

* Nothing is altered by replacing one or more of the strict inequalities by <. 
Also, a and c may equal — oo while b and d may equal 4- oc . 



60 LEBESGUE MEASURE AND INTEGRATION 


Exercises 

1 . Let A be the open rectangle (a, b) X (c, d). Show that fi{A) = 

{b — a) ' {d — c), as is to be expected. 

2. Let A be the open square (—1, I) X (—I, 1) and let 


(1 - |x|)^ + (l - \y\f 

Show that the two iterated integrals exist and equal zero, but 
that JJ^/ does not exist. 

3. Let A be the open square (—1, 1) X ( — 1, 1) and let f{x,y) = 

x/( 1 — y^). Show that fij {Jij^ /(.v, y) dx) dy exists and equals 
zero, while Jij {Jij f(x,y) dy} dx does not exist. 

4. Let A be the open square (0, 1) X (0, 1) and let f{x,y) = j 0, 

or — / ^/2(1 — y) according as ^ > x, = .x*, or j < x. 
Show that both iterated integrals exist but that their values 
are different. 


§8. CONCLUDING REMARKS 

We have now completed the development of the essential parts of the 
theory of Lebesgue measure and integration, at least to the extent that 
we shall need it. In this section we present a few remarks that may prove 
of interest and value to the reader. 

(A) Suppose that / is a continuous function defined on a bounded 
closed interval [a, b]. Then it is very easy to show that /(x) dx has the 
same value whether interpreted as a Riemann integral or as a Lebesgue 
integral. We shall use this fact occasionally without specific mention. 
By a slightly more delicate argument one can show that the aforementioned 
equality holds if /is any function which is integrable in the sense of Riemann. 
(There is no contradiction, of course, between this statement and the 
remark following Definition 6~2, since the integral mentioned there is 
extended over an infinite interval.) It is also of interest to note the 
following theorem, which demonstrates a remarkable connection between 
Riemann integration and Lebesgue measure: A function / defined on a 
bounded closed interval is integrable in the sense of Riemann iff |/| is 
bounded and the points of discontinuity of / form a null-set. 

(B) The entire theory can be extended in the following way. Let a 
non-empty set X be given, and let a sigma-algebra (also called Borel field) 
of subsets of X be given. This means that we have a non-empty class ^ 
(for “Borel”) of subsets of X which satisfy the following conditions: 
{a) The complement of any member of ^ is also a member of {b) The 



7. INTECilATION OVER PLANE SITS 


union of finitely many or countably many members of d? is also a member 
of (c) 0 and X are members of (d) The intersection of finitely 
many or countably many members of is also a member of iM, (In fact, 
(r) and (d) are consequences of (a) and (/)), but for simplicity we have 
included them.) Furthermore, suppose that to each set A belonging to 
we associate a non-negative number (H-oo allowed), /i(A), satisfying the 
following conditions: (a) /i(0) = 0; (/>) A^,) = 2 

finite or countable disjoint union of members of (c) if c A^, if 
A^ e and if f^iA^) = 0, then A^ c- .^^and //(/li) = 0. (In fact, /x(Ai) = 0 
is a consequence of the other assumptions.) 

Such a function // is called a measure on the class and the entire 
theory which we have built up remains valid in this more general context. 
In particular, probability theory depends in an essential way on the 
theory of measures defined on Borel fields. 

(Strictly speaking, the condition that every subset of a set of measure 
zero should belong to is not imposed in the definition of a measure; 
when this condition is satisfied the measure is said to be complete. 
However, every incomplete measure defined on a Borel field can be 
extended to a larger Borel field so that the new measure is complete.) 



CHAPTER 3 


THE AND P-SPACES 


All functions under discussion in the first five sections of this chapter 
are complex-valued and defined on a fixed measurable subset A of R. 
We assume that ju(A) is positive (-f oo allowed), for if ju(A) = 0 the entire 
content would become vacuous. In the first four sections the number p 
stands for a fixed finite number > 1 , and in §5 it will be indicated briefly 
how the material developed in the first four sections can be extended to 
the case /? = -f oo. 


§1. BASIC CONCEPTS 

Definition I: Suppose that f is measurable on A and that \f\'^ is 
summable over A. {Note carefully that the measurability of f guarantees 
the measurability, but not the summability, of\f\^. Cf Exercises 1 and 2.) 
Then f is called p-th power summable over A . ( When p = 1 this is the same 
as summable; when p = 2 we use the expressions square integrable or 
quadratically integrable,) 

Definition 2: The class of all p-th power summable functions is 
denoted L^{A); since A is fixed, we usually employ the simpler notation L^. 
Iff S L^, the non-negative finite number is denoted ||/llj,. {From 

now on we write j* instead of 

Theorem I ; {a) If ^ is any complex number and if f ^ then 

ufeL^ and ||a/lU = |a| • |i/||,. 

W = 0 ^ almost everywhere. 



t. SASIC cONcsrrs a 


Proof: Trivial. 


The following theorem, despite its simplicity, plays a vital role in the 
theory to be developed later in this chapter, and we present two quite 
distinct proofs. 

Theorem 2: Let x anti y he non-negative real numbers and let 
0 < r < 1. Then ^ < r.v + (I — r)y, and equality holds iffx y. 


First Proof; Since the assertions of this theorem are obvious if 
either a' or y vanishes, we confine attention to the case a > 0, y > 0. Let 
a and h be positive numbers. Then 0 < — \/h)^ = n -f /> — 2\/ah, 

and so -f h). (This is, of course, the familiar inequality 

between the geometric and arithmetic means.) Replacing a by 
and b by where the c’s are arbitrary positive numbers, wc obtain 

< liiici + + cj} =* 

k{Ci + ^3 + Q)- Repeating this procedure any finite number of times 

we obtain, for any positive integer n and positive numbers a,, aa, . . . , a2«, 


{aia2 


^2 ) 



(aj + ^2 "F ' ’ ‘ 


“F ag”). 


If we set k of the a’s equal to .v and the remaining a‘s equal to y (where 
0 < k < 2"), we obtain 


yk -kl-k 




y- 


Thus, the desired inequality has been established in the particular case 
that r is a binary rational — a rational number which can be written as a 
fraction whose denominator is a power of 2 — in the open interval (0, 1). 
Since the binary rationals are dense in R, we now conclude by a trivial 
continuity argument that the desired inequality holds for any value of r 
in (0, 1). Finally, by working backwards through the entire argument, 
we see that equality holds iff x = y, as asserted. 


Second Proof: Let the function / be defined as follows: /(O- 
r - I + r- rt for / > 0. Then / is continuous, while for positiw 
values of f it possesses derivatives of all orders. In parbcular, / (f) - 
rr~k _ r and /"(t) = r(r - 1)^"* < 0- The latter equation shows that 
the function /is concave, and from the first equation it 
r = 1, and no other value oft, furnishes the maximum value of/. Thus. 
f(t) < fil) or r*- < (1 - r) + rt, for t positive and different from unity. 

Replacing I by x\y, where a and y are distinct positive numbers, 

< (1 _ r) + „\y, or x-f- < rr + (I - r|y, while eqnahtv must 

hold when jc = y. 



64 THE AND /’'-SPACES 


Definition 3: //* 1 < /? < -f oo, the conjugate number of p is the 
number pKp 1); we always denote the conjugate of p by q. {Since 
l//> -h 1/9 = 1 , we see that p is the conjugate number of q. Note that 2 is 

its own conjugate, and that as p-* j-f^ooj conjugate -> 

therefore sometimes find it convenient to consider 1 and + 00 <35 the con- 
jugates of each other.) 

Exercises 

1. Prove that the measurability of / guarantees the measurability of 

l/r. 

2. (a) Give an example of a function belonging to L^([0, 1]) but not 

belonging to L^{[0, 1]). 

(b) Determine a set A and a function belonging to L^{A) but not 

belonging to D{A). (As will become clear in §2, p{A) must 
be +00 in this case.) 

§2. THE HOLDER AND MINKOWSKI 
INEQUALITIES 

Theorem I : If f and g s U, then fg^L^- 

Proof: Let x = |/|^ y == and r = \/p (so that 1 — r = l/q), 
and substitute these values into Theorem 1-2. We obtain |/| • |^| < 
^Ip l/l^ + Since \fg\ (= |/| • |^|) is the product of two measur- 

able functions, it is also measurable, and since 1//?|/P^4- ^Iqlgl"^ is 
summable, so is \fg\. We then know thaty^ is summable — fg e D. 

Theorem 2 (Holder Inequality): If feL^ and then 

Wfgh < ll/ll. Il^lla- 

Proof: (a) By {b) of Theorem 1-1, the assertion is trivially true if 
either |1/||, = 0 or ||g||, = 0. 

(/>) Suppose ll/llp = I and llgllg = 1. Integrating the inequality 
developed in Theorem 1 , we obtain 

ii/giii = fi/«i < - fi/r + - figr = - + - = 1 = ll/ll,, iigii,. 

J PJ P 

(c) On account of (a), we may assume that ||/||j, > 0 and HgH^ > 0. 

Let/i = (l/ll/IU)/andgj = OlWgVg- By (a) of Theorem 1-1, |l/,||p = 
WgiWq = now, by (b) of the present proof, 

— ^ — fg < 1 - 
ll/llp llgila 1 


ll/igilli<l. or 



2. THE HOLDER AND MINKOWSKI INEOUALITIES «S 


Again referring to (a) of Theorem 1-1, we obtain, as desired, 

II /fill < ll/llp tigll,. 

Theorem 3 : Iff and then / + g e 

Proof: 

I/ + gl” < (I/I + \g\y < (2 max {|/1, |g|})p 

= 2” max {Ifl”, Ijfl'-} < 2*' (|/|' + |g|»). 

(The last inequality follows from the obvious fact that the larger of two 
non-negative quantities is at most equal to their sum.) Hence 1/+ gl** 
is summable, and sof+geL’’. 

Corollary: If /i./j, L", then any linear combination of 

these functions, ai/, + aj/j ■ • • + also belongs to L”. 

Proof: Use Theorem 3, (a) of Theorem 1-1, and finite induction. 

Theorem 4 (Minkowski Inequality): If f and geL”, then 
11/ + g’llp < ll/llp + llgllp- (By Theorem 3 we know that the left side of 
this inequality is meaningful.) 

Proof: (a) If p — I, we argue as follows: 1/ + gl < I/I + |g|, 
hence J|/+gl < f |/| + jigl.or ||/ + g||, < ||/||i -I- llgll,. 

(b) If/? > 1, we argue as follows: \f + g\^ = I/+ g| • I/+ i?!” ‘ < 

(I/I + ifi) • I/+ = I/I • I/+ + ij?i • I/+ i?i’’-‘- Thus; 

Ji/+ gr <Ji/i • I/+ gr‘ +|igi • I/+ gr'- 

In Theorem 2 replace /by |/| and ^ by ]/ 4- also note that ||/||p = 

II I/I II, and 11/ -1-^11,= II \f+g\ II,. We obtain (using the fact that 

(P - 1)9 = />) J I/I • 1/ + < ll/llp • II 1/ + gl'-'l, = ll/llp • 

(J I/+ g|''’-i>«)V« = ll/ll, • (J I/-I- gl”)*/* = ll/llp • 11/ + gWl'"- Similarly, 
Jlg|-|/+^r^< Wp-||/+gll!:'*- Hence,J|/-l-g|’ < (ll/ll, + llgllp)- 
(11/ + gllp)”/*, or 11/ -I- gllj < (ll/IU + llgllp) • (II/+ gllp)”'*- Dividing by 
(II/+ gWvY'^ we obtain (||/-1- ghT^'" < ll/llp + PgUp- But/? - plq = 
/?(1 - \lq) =/?• 1//? = 1. Hence, ||/+gllp < ll/llp + PgUp- (If P/+gPp = 
0, the division is not legitimate, but in this case no proof is needed.) 

Exercises 

1. Prove the Holder and Minkowski inequalities in the particular 
case p = q — 2by exploiting the fact that J 1/ + > 0 for 

all values of the real parameter 2. 



U THB AND /^-SPACES 


2. (a) Assuming that f and g never vanish on A, prove that ll/glh = 

H/llp WgW^ iff the fractions \f\^l\g\'^ ^ndfgl\fg\ are both constant 
almost everywhere on A. 

(b) Assuming that l|/||j, and l|g||p are both positive, show that 
11/ + SWp = WfWv + ll^llp iff fi^^re exists a positive constant a 
such that / (x) = (x.g(x) almost everywhere. 

3. In the inequality \fg\ < (I//?) \f\^ -f which is employed 

in Theorem 1 , replace / by and g by gfc, where c is an 
unspecified positive constant. The preceding inequality 
becomes 

\fg\ < - i/r + — igr, 

P ^ 

and by integration we obtain 

ii/giii < - w/n + — 

P q 

Show how to obtain the general form of the Holder inequality 
directly from this result. 


§3. DEFINITION OF A METRIC IN 

For any two functions / and g belonging to let us define p{f, g) as 
II/— ^llp. (According to Theorem 2-3, II/— ^llp is finite; we merely 
replace^ by — ^.) Clearly, p(J’,g) = pig^f)^ a^^d from Theorem 2-4 we 
obtain 

P(/, A) = 11/ ~ All, = IK/ - 4- (^ - /7)||, 

< II/- ^llp + 11^ - All, = pif^g) + pig, h). 

Referring back to Chapter 1, we see that the second and third axioms of a 
metric space are satisfied. However, a slight complication arises in 
connection with the first axiom, for the vanishing of pif,g) does not 
guarantee that / and g are the same function — according to Theorem 1-1 , 
/ and g may differ on a null-set (but not on any larger set). This obser- 
vation suggests how to salvage the situation. Instead of considering 
to consist of functions (defined on A) as we have done until now, we form 
equivalence classes of these functions, two functions being considered 
equivalent iff they coincide almost everywhere on A, It is easily shown 
that this is indeed an equivalence relation and that the resulting collection 
of equivalence classes becomes a metric space (which we continue to 
denote L^) if the “distance” between any two equivalence classes, Ei and 



< COMPtETiNiSS OP ip 4? 

£ 2 , IS defined as \\f ^ —/ail,,, where/, and /a arc anv members of £, and £«, 
respectively. (Cf. Exercise 1 ,) 

We continue to use such expressions as “consider the function /in 

; but this must now be understood as a condensed version of the 
statement consider the class of all ^-th power summable functions which 
differ from / only on a null-set. “ 

An alternative device, which avoids the need of forming ecjuivalence 
classes, is to consider D' as a pseudo-metric space, in which, while con- 
tinuing to require that p(/,g) > 0 for all elements /and g, we permit 
piAg) = 0 to hold even though/ 5 *^ g. Which of these two schemes is 
chosen seems to be entirely a matter of taste; mathematically, they are 
almost certainly of equal value. 

We conclude this section with the remark that the distance function 
which has been defined is translation-invariant. By this we mean that for 
any three functions /, g, h belonging to the distance between/and g is 
the same as the distance between / /j and g + h. In the following 
chapter we shall deal with a class of metric spaces in which the distance 
function possesses this important property. 

Exercise 

1. Justify in detail the assertion that the “new” /J' is indeed a metric 
space. 


§4. COMPLETENESS OF 

We recall that a metric space M is said to be complete if every Cauchy 
sequence x,, Xg, ... in A/ is convergent — that is, there exists a member x 
of M such that lim„^,,, p(x,,, x) = 0. The following theorem, which 
asserts that D' is complete (in fact, it asserts even more than this), is 
justly regarded as one of the great theorems of analysis, and in particular 
one of the highlights of the Lebesgue theory. 

Theorem I (Riesz-Fischer): Let ... he a Cauchy sequence 
in LP. Then there exists a function f in IJ' such that 11/ — /„ll,) 0. 
Furthermore^ although the sequence may fad to converge (pointwise) 
anywhere in A, it is always possible to select a subsequence of the given 
sequence which converges almost everywhere in A to f 

Proof: (a) Taking account of Exercise 4-4 of Chapter 1, we see 
that it suffices to prove that we can select a subsequence of the given 
sequence, say /i,/ 2 , ...» and a function/in C*' such that 11/ -~/«llx. ^ 

and/„(x) ->/(x) for almost all points x in A. 

(b) Since the sequence /„/„ ... is Cauchy, we can choose an index 
AT, so large that \\f„ -/„!< 1/2 whenever n and m > N^. Then we can 



M THE LP- AND f>»-SPACES 


choose JVg larger than Ni such that ||/„ — /^ll < 1/2* whenever n and 
m > N 2 , and we continue this procedure. (1| H means H U,; the numbers 
111, 1/2*, 1/2*, . . . can be replaced by any other sequence of positive 
numbers whose sum is finite.) Then we can choose a subsequence /i,/ 2 , 

/a, . . . in such a manner that ||/i -/ 2 II < 1/2, II /2 -/all < 1/2*, etc. 
(For example, let/, =/Ar,+i, A = etc.) Let g, = I/, -/*!, = 

I/* -/ 3 I. etc. Since IlgJ = llA - A+ill < 1/2*. we obtain, by the 
triangle inequality, 

iigi + g^ + ■■ + g.ii + + - + 

Let Sn == are non-negative, they form a monotone 

non-decreasing sequence, and their norms are bounded above by unity. 
By the monotone convergence theorem, there exists a function t belonging 
to such that J < 1 and — ► r{x) everywhere; of course, t(jc) is 

finite almost everywhere. Therefore, the series l/* — fjc+i\ converges 
to a finite sum almost everywhere; a fortiori, the series 
Converges almost everywhere to a function h such that ||/?|| < ||t|| < 1. 
Since the partial sum {fk “ /k+i) telescopes to / — /„+i, we conclude 
that lim,j-,oo/„.^i exists finitely almost everywhere and equals /i — h, 
which, being the sum of two members of L^, is also a member of V. We 
have therefore shown that /„ converges pointwise almost everywhere to the 
function / = /i — /i, which belongs to LP, 

(c) Now we shall show that||/ — /„||— ►O. From (6) it is clear that 
\L -/il < T and I/-/ 1 I < T. Hence, 

l/-/„l = l(/-/i) + (/ -/n)l < I/-/ 1 I + l/„ -/il < 2r, 

and so 1/ — /bI” < IV. Since t” is summable and since |/ — 

0 almost everywhere, we may apply the Lebesgue dominated conver- 
gence theorem to the sequence of functions |/ — /l^, |/ — .... 

Thus, lim„^„ J 1/ - /bI" = J limB..oo 1/ - /bI” = J 0 = 0. and> therefore, 

ll/-/«ll-o. 

We now extend the preceding results a little further. Suppose that 
a different subsequence, say . . . converges almost everywhere, 

say to a function g. By repeating the preceding arguments we see that 
11^ -fj -^0; then, since ll/-g|| < ll/-/„ll + llg -/„ll, we see that 
liy* — gll r= 0, and so / = g almost everywhere. Thus, we have obtained 
the following result. 

Corollary: If a Cauchy sequence in converges in the metric of 
D* to f and pointwise almost everywhere to g, then f and g are equivalent — 
i.e.,/(x) =« g(x) almost everywhere. 



5. THE SPACE 1“ 


M 


It may be advisable to emphasize that the 
section are valid for > 1 , not just for p > 1. 


considerations of this 


Exercise 

1. (a) Show that for any positive integers m and n, the followinii 
equalities are correct: ® 


(<■) 

(ii) 

(Hi) 


X ~ Tr\ 


r2ir pr 

sin** nx = | cos^ ;i.> 

P"’ |•2!r 

sin nx sin mx == J cos nx cos mx 

l*27r 

I sin mx cos nx = 0, even if m = n. 


0 if (?) ^ n ; 


(h) Let /e L“([0, 27r]). Show that the integrals /(,v) cos n.v and 
Su’/ix) sin nx exist for n = 1 , 2, 3, . T . . 

(c) Let the sequences a,, a^, a^, . . . and h^, h^, b.^, , . . of complex 
numbers be given and suppose that the series and 

'Lk~i both converge (to finite sums). Show that there 
exists a function /in L‘“([0, 27 t]) such that all the equalities 
«» = 1 /tt HTfix) cos kx, = 1 /tt ll"f(x) sin kx hold. Hint ; 
Form the sequence of functions . . . , where 


n 


fnix) = COS kx + b„ sin kx). 

A ----1 


§5. THE SPACE L' 

How should we interpret U when p — + oo? While we could simply 
give a definition, we shall try to motivate the answer. For simplicity, 
suppose ijl{A) is finite (but, of course, positive). If/e and if |/| < C 
everywhere on /t, then J |/|^ < J C*' = and so H/li^ < C(p(A)yi^. 

As/?-> -f 00 , {fjL{A)yi^ 1, and so limp..+^ ll/ll,, < C. (Note that if/ 
is measurable and bounded and piA) is finite, then / g for all values of 

P>) On the other hand, suppose |/| > D on some set B of positive measure. 
Then l/l*' > |/|p > = D^p(B), and so 1|/11^ > D{p(B)fl^, 

Letting ^->+ 00 , we obtain \Mp-*^\\fh> {p(B)fi^-^\, 

Putting together these two results, we see that lim ^_^«5 ||/llp exists iff/ 
is equivalent to a bounded function; of course, this includes the case that 
/itself is bounded. When this condition is satisfied /is said to be essentially 
bounded, A little thought will show that when the aforementioned limit 
Ii /llj) exists it is equal to the quantity inf^^^y sup |^|. (Recall that g ^/ 
means that g = f almost everywhere.) This quantity is called the essential 



70 THE AND f>'-SPACES 


supremum of |/|, denoted ess sup |/| ; by its very definition it is the same 
for two equivalent functions. (If/ is not essentially bounded, then it is 
not difficult to see that ||/||p ->+00. The reader is asked to justify these 
statements in Exercise 1.) 

Therefore, we define L* to be the set of all essentially bounded 
functions (more precisely, the set of equivalence classes of all measurable 
essentially bounded functions), and 

ll/IU == ess sup 1/1. 

The reader should have no difficulty in seeing that all the results 
obtained in the earlier sections of this chapter continue to hold for 
provided that the corresponding value of q is taken as unity; in fact, 
most of the proofs, particularly that of the Riesz-Fischer theorem, are 
easier for /? = 4- 00 than for p < -h 00. 

Incidentally, note carefully that, although the motivation for our 
definition of L® required that //(/f) should be finite, the definition is still 
applicable when piA) = -f 00. 

Exercise 

1. Prove the assertions made concerning the behavior of ||/11^ as 
p 4- 00 (under the assumption that p{A) is finite). 


§6. THE SPACES AND 

Let n be an arbitrary but fixed positive integer and let 1 < /? < +00. 
For each pair of ordered ^-tuples of complex numbers, 

a = (ai, aj, . . . , a„) 

and b = (/Sj, /S*, . . . , /S„), we define p„{a, h) as !«» - 

Clearly p,(a, b) = pj,(b, a), p^(a, b) > 0 if a # Z), and p„{a, ft) = 0 if 
a = b. In order to prove that defines a metric on the collection of all 
ordered /i-tuples it therefore suffices to show that the triangle inequality 
holds. To accomplish this, we let A be the subset [0, n) of R and associate 
with the n-tuples a and b the functions a and b, respectively, where 
a(x) =a when x e [k — I, k), ^ = 1, 2, . . . , and similarly for b. 
It is obvious that d, 5, and d — b belong to L^{A), and that \\d — = 

b). If c denotes a third w-tuple, (71, ^2^ • • • ^ ^ denotes 

the corresponding member of W(A), we obtain from Theorem 2-4 the 
following: p^{a, b) = \\{d - c) 4- (c - 6)||^ < |la - c\\^ + 11c - 5ll^ = 
Pv(a, c) + Pp(c, 6), or, more explicitly, 

( n \l/j» / n \1/d / n \\Jp 

+[2jr*-ArJ • (6-1) 



THE SPACES r? AND /»• Tl 


Thus, we have shown that the collection of all ordered n-tuples of complex 
numbers is a metric space if the distance-function previously defined is 
employed. (In particular, for « = 3 this result serves to complete the 
discussion of Example {m) of §1-2.) The metric space thus defined is 
denoted /,i;. As in §5, we may let p approach + oo, and we then obtain the 
sup metric: Poo(^» = supi. j:., „ (la^t — /^aI}- The corresponding metric 
space is denoted, of course, by the symbol 

If I < p < "E 00 and q is the conjugate number of/?, the application 
of Theorem 2-2 to the functions a and b defined previously leads im- 
mediately to the inequality 

n / » \1, / tt vl y 

IkAK iKii ‘lliihn • (6 2) 

l \k I I \): \ } 

Again we may let p approach -f oo; this leads to the trivial inequality 


I^AI 


7c -.-1 



We emphasize that for different values of p the metric spaces /j; 
consist of the same objects, but that the distance between two fixed «- 
tuples varies, in general, with p. (Cf. Exercise 1.) 

It is natural to consider the possibility of making n infinite in the 
preceding considerations. Everything goes through as before, provided, 
as might be expected, that we confine attention to those sequences 
(ai, a 2 , aa, . . .) which satisfy the condition We sketch 

the essential ideas very briefly. If i converge, 

then by setting all the quantities ^ appearing in (6-1) equal to zero we 

obtain 



Fixing n temporarily on the left side and exploiting the assumed con 
vergence of the infinite series mentioned previously, we obtain 



Now we may let n approach +00 on the left, and we obtain 


j|i«. - 

This shows .ha, ,h. serios S, 

converges, we may replace and Pk P 


\i/p 





72 THE AND f^^-SPACES 


~ Vk Vk ““ /^fc» respectively, and we thus obtain 

! 00 \1//J / :X> \l/;> / or; \l/p 

2K-iS,rj . (6-5) 

Thus, we have shown that the set of all sequences of complex numbers 
whose components are p-th power summable becomes a metric space if 
the distance between the sequences (a^, ag, ag, . . .) and (/^i, 1 ^ 3 ^ • • •) 

is defined by the left side of (6-5). This metric space is denoted /^; in 
contrast to a sequence belonging to need not belong to if 5 ^ p 2 . 
(Cf. Exercise 2.) 

Just as we obtained (6-3) as a generalization of (6-1), so we can 
generalize ( 6 - 2 ) to furnish the inequality 


QO / 1)0 \l/l> / CC \l/g 

IKI" ■ lAI" - (6-6) 

fc-l [Jc 1 J ) 

provided that the sequences (a^, ag, ag, . . .) and (fli, • • •) belong 

to and /^, respectively. As before, we may define as the collection of 
all bounded sequences, the distance between the sequences (aj, ag, ag, . . .) 
and (iSi, /Sjj, ^ 3 , . . .) being defined as sup,, ,,, ^ la^. - 

The inequalities (6-2) and (6-6), which are both easy consequences 
of the inequality developed in Theorem 2-2, are, like the latter inequality, 
also termed Holder's inequality; similarly, (6-3) and (6-4) are termed 
Mink o wski 's inequali ty. 

Finally, we consider briefly the analogue of the Riesz-Fischer 
theorem. As might be expected, the metric spaces /j; and V' are complete 
for all /?, including p = + cx). Since the case of finite n is easily handled 
directly, or may easily be obtained as a corollary of the case of infinite «, 
we confine the statement of the following theorem to the P' spaces. 

Theorem I: For any /?, 1 < p < + oo, the metric space F* is 
complete. If the sequence ai, ag, ag, . . . , where aj, = (a[^\ <X 2 '\ . . .), 

is Cauchy, then for each positive integer m the sequence , . . . 

converges to a number a„,; the sequence (a^, ag, ag, . . .) belongs to F\ and 
is the limit of the given Cauchy sequence. {Less formally, we may say that 
the given sequence aj, ag, ag, . . . converges component-wise to a member of 
F, say a, and lim;t-ao = 0 -) 

Proof: Corresponding to each vector a*, we associate the function 
afc, defined on [ 0 , -f oo)‘ as follows: aj,{t) = for / 6 [w -- 1 , m). It is 
obvious that e Z.^([0, + oo)) and that ^^(a,, aj = l|a, — 411^. Hence, 
the functions ai,a 2 , ag, . . . form a Cauchy sequence in L^([0, +oo)). 
By the Riesz-Fischer theorem, this Cauchy sequence contains a sub- 
sequence which converges almost everywhere, and from the very simple 



7. SEPARABILITY OF L>' AND P 


73 


form of the functions involved it is obvious that this subsequence must 
converge everywhere, not merely almost everywhere, on ( 0 , -f-oo) to a 
function a which is constant on each interval [m - 1 m) m = I 2 3 
.... If the value of d(0 on [m - 1 , m) is denoted «„! it is evident that 
the sequence* o — (aj, aj, aj, . . .) belongs to /’’ (since a e Z.’’([ 0 , + oo))) 
and that the sequence Qj, 03 , . . . converges to a. 

The arguments presented in this section are, in fact, unnecessarily 
elaborate, for no reference is needed to integration. In Exercise 3 the 
reader is asked to provide direct proofs of the principal results of this 
section. 

Exercises 

1. Prove that the distance between the Ai-tuplcs a = 

(ai, aa, . . . , a„) and h = (/)\, /)* 2 , . . . , fij is independent of/) 
iff at most one of the n quantities — fij. is non-zero, 

2. If 1 < /?i < />2 < 4- 00 , show that /*’» c: 

3. Prove the inequalities (6-2) and (6-3) and Theorem 1 without 

employing any of the theory of integration. 


§7. SEPARABILITY OF P AND P 

Definition I : A metric space is said to he separable if it possesses 
a dense subset consisting of a finite or countably infinite number of points. 

Examples 

(a) Trivially, any metric space consisting of a finite or countably 
infinite number of points is separable, for the entire space is a dense 
subset of itself. 

(b) The metric space defined in Example (b) of §1-2 is separable iff it 
consists of a finite or countably infinite number of points (for no proper 
subset of this space can be dense). 

(c) In contrast to the preceding example, we note that /?, although 
uncountable, is separable, for the countably infinite subset Q is dense. 
Similarly, the complex number system (£ is dense, for the subset consisting 
of all complex-rational numbers (i.e., those whose real and imaginary parts 
are both rational) is both dense and countably infinite. 

We now proceed to prove that both L/(A) and are separable. 
(The separability of which should be obvious, is implicitly proven 

* One must distinguish carefully between the objects , which arc 

sequences of complex numbers (i.e., sequences in <£) and the sequence . . . 

(which is a sequence in /**). 



74 THE I*'. AND /’’•SPACES 


during the proof for R) We shall first consider /^, for the argument is 
somewhat simpler in this case, and then we shall merely sketch the proof 
for L%A). 

Theorem I ; is separable {for 1 < /> < + oo). 

Proof: Let S be the subset of consisting of those sequences 
containing only complex-rational terms of which only a finite number (if 
any) differ from zero. By an elementary argument (cf. Exercise 3) it is 
shown that S is countably infinite. Given any positive number e and any 
member cr = (aj, ag, ag, . . .) of we can, because of the convergence of 
the series choose an integer N so large that a^) < 6/2, 

where a^ is the sequence (ai, ag, ag, . . . , a^, 0, 0, 0, . . .). Then it 
follows from the denseness of the complex-rational numbers in (£ (and 
from the finiteness of N) that we can choose a vector 

b = (/?!, ^2> 1 0, 0, . . .) 

belonging to S such that ppia^, b) < 6/2. It now follows by the triangle 
inequality that pp{ay b) < 6, and so it has been shown that S is dense, and 
hence, that is separable. Note that this argument fails for /? = -f-oo; 
in fact, is not separable. (Cf. Exercise 2.) 

Theorem 2: L^iA) is separable {for 1 < /? < +00). 

Proof: We shall, for ease in exposition, confine attention to the 
case /? = 1, y4 = [0, 1], leaving it to the reader to convince himself that 
the argument can be modified easily to apply to any larger (finite) value of 
p and to any other measurable set A. 

First we observe, by considering separately the real and imaginary 
parts of any member of L\A), that it suffices to confine attention to real- 
valued functions; then, by expressing a real-valued function as the 
difference of its positive and negative parts, we see that it suffices to show 
that, given any non-negative real-valued function /belonging to L^{A) and 
any positive number 6, there exists a function g belonging to a fixed 
countable subset of L^{A) such that \\f — < 6 . If /is unbounded, we 

see by invoking the monotone convergence theorem that there exists a 
bounded non-negative member /i of L\A) such that 11 / — /ilU < e/3. 
(If / is bounded, we may merely choose /i = /.) By Lusin’s theorem, 
there exists a continuous function /a such that 0 < /j < sup /i everywhere 
in A while the set {x |/i(x) 7^ /gW) has measure less than e/(3 sup/i). 
It is then evident that |l/i — /glU < e/3. Next, since /g is uniformly 
continuous, we can choose a real-valued step-function /g such that 
l/*W -/»WI < e/3 everywhere in A, so that ||/, -f^Wx = jfi I/* -/si < 
e/3; furthermore, we may impose on /g the conditions that it assumes 



7. SEPARABILITY OP l» AND 


n 


only rational values and that its discontinuities occur only at rational 
points of A. It then follows that \\f-f,\\, < |l/-/,l!, + \\f^ + 

11/2 "“/sill ^ Since the collection of step-functions satisfying the 
conditions imposed on/3 countable (the argument is virtually identical 
with that employed in proving that the set S employed in the proof of 
Theorem 1 is countable), the proof is complete. 

An alternative to the last part of the proof is the following; By the 
Weierstrass approximation theorem (cf. Appendix D) and the denseness 
of Q in /?, it is possible to choose for a polynomial with rational 
coefficients (instead of a step-function). Since such polynomials form a 
countable subset of U{A), we conclude again that L\A) is separable. 

Taking account of the decomposition of each member of 0(A) into 
real and imaginary parts, we see that each of the following countable 
collections of functions is dense in L^{A) : 

(a) The collection of all step-functions having discontinuities only 
at rational points in A and assuming only complex-rational values. 

(b) The collection of all polynomials possessing complex-rational 
coefficients. 

In fact, each of these collections is also dense in D'(A) for 1 < /> < 
-foo, and, with minor modifications in case (^Jf), to any bounded (measur- 
able) set A. These facts should become apparent when the reader extends 
the preceding proof to the case p > \. 

Exercises 

1. Prove that any non-empty subset of a separable metric space is 

also separable. 

2. Prove that is not separable. Hint: Show, by imitating the 

diagonalization proof of the uncountability of /?, that the 
subset of consisting of sequences whose terms are exclusively 
zeros and ones is not separable. 

3. Justify the assertion made in the proof of Theorem I that the set 

S is countable. Hint: Exploit repeatedly the fact that the 
cartesian product of two countable sets is also countable. 



CHAPTER 4 


NORMED LINEAR SPACES 


In this chapter we shall present some of the most important ideas 
relating to normed linear spaces. The concept of a linear space involves, 
in its most general formulation, a field of scalars. However, we shall be 
interested exclusively in two fields of scalars, namely, the real and complex 
number systems, and, with occasional slight modifications (which will be 
pointed out at the appropriate places), the theory which we shall develop 
applies equally well to linear spaces over either of these two fields; in 
order to formulate our results in such a manner as to apply equally well 
to both cases, we shall usually employ the term “scalar” rather than 
“real number” or “complex number.” 


§1. LINEAR SPACES 

The reader is, almost certainly, already acquainted, at least on an 
intuitive basis, with the concept of a linear space (also called “vector 
space”). We shall, therefore, present the basic ideas very concisely, 
trusting the reader to fill in the necessary details. 


Definition I : A linear space is a non-empty collection of objects, 
called vectors, one of which is the zero vector {denoted o), which can be 
added pairwise and multiplied by any scalar in a manner consistent with the 


U 



. LINEAR SPACES Tf 


following laws* , which are to hold for all vectors f g, h and all scalars «. /i; 


(c) a(/ + = (a/) -f (a^), 

(e) (a + /?)/= (a/) + {fif\ 

(^)/+o=/, 


(^) / -f -f /O = (/ ’ + g) 4- 
(j) oi(fif) = (a^n/; 

(/) 1 /-/; 

(h) 0 /’= 


We shall refrain from proving, or even slating specifically, the vast 
number of immediate consequences of this definition. For example, we 
shall freely use an expression such as c- -f /-f 4. /j, which, strictly 

speaking, becomes meaningful only when parentheses are suitably 
provided; since all (legitimate) distributions of parentheses fortunately 
lead to the same vector, we may thus speak unambiguously of the vector 
e + f g + Ih The proofs of this fact and innumerable others arc 
virtually identical with those that the reader has encountered in studying 
the foundations of the real number system. 


Examples 

Many of the sets used in Chapter 1 to provide illustrations of metric 
spaces also provide illustrations of linear spaces. However, it must be 
kept clearly in mind that even though one and the same collection of 
objects may sometimes be thought of as either a metric space or a linear 
space, two essentially different ideas are involved: in the former case, we 
are concerned only with assigning a distance between any two members 
of the collection, while in the latter case we are concerned with adding 
members of the collection and multiplying them by scalars. As a matter 
of fact, we shall be particularly interested in collections of objects which 
are both metric spaces and linear spaces, but we must delay briefly the 
introduction of these mathematical entities. 

(a) The most trivial example is furnished by the set consisting of a 
single object o satisfying the obvious rules: o o — o, clo = o for any 
scalar a. 

(h) The real number system, provided with the usual definition oi 
addition and multiplication, constitutes a real linear space (i.e., a linear 
space over the real field). 

(c) The complex number system, provided with the usual definition 
of addition and multiplication, constitutes a real linear space if we 
permit multiplication only by real numbers, while it constitutes a complex 
linear space if we permit multiplication by complex numbers. 

(r/) The class of all continuous real-valued functions defined on any 
interval (not necessarily bounded), if addition of two such functions and 
multiplication by any real scalar are defined in the obvious manner, 


* For clarity in exposition we agree to denote 
letters and scalars by lower case Greek letters (except when spec 
appear as scalars). 



78 NORMED LINEAR SPACES 


becomes a real linear space. If, instead, we consider complex- valued 
functions, we have either a real or a complex linear space, according as 
we allow multiplication by real or by complex scalars. 

(e) Similarly, we obtain linear spaces if in (d) we replace the con- 
dition of continuity by the condition of boundedness (but not if we impose 
the stronger restriction that |/(a:)| < 100 for all points x in the specified 
interval [why?]). 

(/) The set of ordered n-tuples of real numbers (ai, aj, . . . , a„) 
becomes a real linear space if we define addition of vectors and multi- 
plication by real scalars in the obvious manner: (ai, ag, . . . , aj -f 
(ft, ft, , /?„) = (a-i + ft, a* + ft, . . . , a„ + ft) and ^(a,, Og, . . . , aj = 
(ya„ yaj, . . . , yx„). 

We conclude this introductory section with a few elementary defini- 
tions and theorems. 

Definition 2; A finite non-empty collection of vectors {/i,/ 2 , . . ./«} 
is said to be linearly independent if the equality oLifi -f- a 2/2 -!-•*• + 
Xnfn = o implies that aj = ag = * • * = = 0. An infinite collection is 

said to be linearly independent if every finite (non-empty) subcollection is 
linearly independent. 

Definition 3: A non-empty set S of vectors is termed a linear 
manifold if for every pair of vectors {/, g} belonging to S and every pair 
of scalars {a, /9}, the vector of -f ^g also belongs to S. 

Theorem I : If the vectors f.fy ^ • *fn belong to a linear manifold 
S and if ai, ag, . . . , a,, are arbitrary scalars, then oiif -f ■+■••• + 

a«/n ^ 

Proof: Trivial induction based on Definition 3. 

Theorem 2: Given any non-empty collection T of vectors, the set S 
of all vectors expressible as linear combinations of members of T (Le., 
ai/i 4“ 0 C 2/2 + • • * + choice of a finite set of vectors /i,/ 2 , . . . , 

fn belonging to T and any choice of the scalars aj, ag, . . . , a„) is a linear 
manifold; furthermore, T G S, and if S is any linear manifold such that 
T ^ §, then S S §; i,e. , S is the smallest linear manifold containing T. 
The manifold S is said to be spanned by T, and is denoted 3R(T), 

Proof: Left to reader as Exercise 1. 

Theorem 3; If T is a linearly independent collection of vectors, 
then each member of 9K(r) possesses a unique representation as a linear 
combination of members of T (Note that this theorem holds whether the 
set T is finite or infinite.) 



I. LINfAR SPACES 19 


Proof: Left to reader as Exercise 2. 

Definition 4i A basis of a linear space L is a linearly independent 
collection T of vectors such that L = 501(7). A finite-dimensional linear 
space is one which is spanned by some finite collection of vectors; otherwise 
it is infinite-dimensional. 

Theorem 4: (a) Every finite-dimensional linear space contains a 
basis.* 

{b) Every basis of a finite-dimensional linear space consists of a 
finite number of vectors. (This seems obvious, but the proof is a bit tricky.) 
Furthermore, any two bases of the same finite-dimensional linear space 
consist of the same number of vectors. This number, being independent 
of the particular choice of basis, thus represents a property of the given 
linear space, and is termed the dimension of the space; we employ the 
obvious notation dim (L) for the dimension of L. 

(c) From every collection of vectors S which spans a finite-dimensional 
linear space L but does not constitute a basis of L, it is possible to extract 
a proper subset S of S which does constitute a basis of L. 

(d) Every linearly independent subset of a finite-dimensional linear 
space L which is not a basis can be enlarged to form a basis. 

Pr<X)f: Left to reader as Exercise 3. 


Theorem 5: Any two linear spaces Lj and (over the same field 
of scalars) of the same finite dimension are isomorphic; that is, it is possible 
to establish a one-to-one correspondence between the members of and L* 
which preserves the operations of addition of vectors and multiplication by 
scalars. 


Proof: Let n be the common value of dim (Lt) and dim (£ 2 )* 
According to Theorem 4, we can find a basis {/is/z, • • . »/«} of Li and a 
basis {g^, ^ 2 , . . . , gn) of Lg. By Theorem 3, each member/ of Li has a 
unique representation in terms of the /^’s: 


/= OLifx -f OC2/2 H" * * ' 4- 

Conversely, every choice of the scalars ai, ag, . . • , determines a unique 
member of Lj. The correspondence 


a./l + «*/>+••■ + «n/n + • • • + *ngn 


obviously provides an isomorphism between Li and L^. 


• This is also true of an infinite-dimensional spaw 
finite induction or some equivalent logical principle. (Ct. 


but the proof requires trans- 
Exercisc A-6.) 



80 NORMED LINEAR SPACES 


Indeed, this argument shows that there is, for each positive integer 
w, essentially only one real and one complex /7-dimensional linear space, 
namely, the set of all ordered /7-tuples of scalars. 

Exercises 

1. Prove Theorem 2. 

2. Prove Theorem 3. 

3. Prove Theorem 4. (Not as trivial as it may appear! If this 

problem appears too difficult, the reader should consult a 
text on linear algebra.) 


§2. NORMED LINEAR SPACES 

The L^-spaces which were considered at some length in Chapter 3 
are linear spaces; this is clear from Theorems 1-1 and 2-3 of that chapter. 
However, these spaces possess an additional feature — with each member 
/ of is associated the real number ||/||.p. We now proceed to discuss 
this additional feature in an abstract setting, and later we shall make 
numerous applications of it. 

Definition I : A normed linear space is a linear space on which is 
defined a real-valued function, denoted 1| || and termed a norm, satisfying 
the following properties: 

(0 ll/ll =^0iff=o, \\f\\>0iff^o; 

(//) II a/ 11 = la| • 11/ II for every scalar ol and every vector f {homo- 
geneity property of the norm); 

(Hi) 11/ + ^11 < 11/11 + 11^11 /or every pair of vectors [f, 

Examples 

(a) Theorems 1-1 and 2-4 of Chapter 3 show that is a normed 
linear space, 

(b) Either /? or (£, with the norm of any number defined as its 
absolute value. 

(c) The linear space C([a, b]), with ||/11 = l/WI* 

(d) The linear space C([a, b]), with ||/1| = |/(x)| dx. 

(e) The linear space of all ordered /7-tupIes of scalars, with 

||(ai, aa, . . . , a„)|| = max 

(/) The same linear space as in (e), but with l|(ai, ag, . . . , a,,)|| = 
(Zfc-i where /? is a finite constant > 1. (The validity of con- 

dition (Hi) of Definition 1 is established in §3-6.) 



2. NORMED LINEAR SPACES 


81 


The reader will certainly note the close relationship between some 
of the examples of metric spaces given in Chapter 1 on the one hand and 
some of the examples which we have just listed. The following theorem 
makes clear that every normed linear space is endowed in a very natural 
manner with the structure of a metric space. The elementary proof is 
left as Exercise 1 . 

Theorem I . A normed linear space becomes a metric space if the 
distance between two vectors is defined as follows: p{fi ^) = ||y‘ — ip||. (// 
is understood, of course, that —g means (~I)^ and that f — g means 

f+i-g)-) 

Whenever a normed linear space is treated as a metric space» it is 
understood that the distance function is the one induced by the norm in 
the manner explained in the preceding theorem. 

Theorem 2: For any vectors f and g, ||/± g|l < ||/|| -f and 

Proof: The first assertion, with the plus sign, is simply part (Hi) of 
Definition 1. Replacing g by we obtain \\f — ^|| < \\f\\ -f ||— ^>^ll,and 
by part (//) of this same definition we obtain 1|— ^|| = ||g|i (by choosing 
a = — 1). 

Using what we have already established, we now prove the second 
part as follows: | 1/|1 = ||(/ ± T g\\ < 11/ ±<^^11 hence 

11/ ± ^11 > ll/ll - Interchanging/and g and again exploiting part 
(//■) of Definition 1, we obtain 11/ ± gH > HgU — 11/11. Thus, 11/ ± gH > 
max {ll/ll - - 11/11} = I ll/ll - U\\ |. 

Part {Hi) of Definition 1 together with the related inequalities estab- 
lished in the preceding theorem are often collectively termed the triangle 
inequality. 

Theorem 3: If the sequence /l, / 2 ,./ 3 . ■ • ■ Cauchy, the corre- 
sponding sequence of real numbers H/iH, \\Jil, II/ 3 II. • • • convergent (to a 
finite limit). If the first sequence converges, then lim„_,,. 11/„11 = 
l|lim„..oo/nll- 

Proof: Given e > 0, there exists N{e) such that whenever m and n 
exceed N{e), \\f„ - fj < «. By the latter part of Theorem 2, 

I ll/„ll - ll/JI I < 

and so the sequence ll/ill, II/ 2 IU ll/alU • • • ** Cauchy, but since R is 
complete, the latter sequence actually converges. 



82 NORMED LINEAR SPACES 


If the original sequence converges to the vector /, we obtain from 
the latter part of Theorem 2 the inequality | ||/ 1 | — 1 |/„|| [< 11 / — /„||. 
Letting n oo, we see that | 1 / 1 | - \\fj -> 0 , or ||/J| — ||/||. 

Although all the necessary tools are now available, we shall dispense 
with the needless task of formally stating and proving a whole host of 
theorems which are obviously extensions of theorems concerning sequences 
of real numbers, such as that the convergence of the sequence . . . 

to / and the convergence of g2, ^3, • • to ^ imply the convergence of 
the sequence /i + gi./s + + gz, . . ■ f g. 


Theorem 4; Let N be a normed linear space {not necessarily finite- 
dimensional) and let /i,/2, . • • ,/n be any finite linearly independent 
collection of vectors. Then there exists a positive number (5 such that, for 
every choice of the scalars ai, ag, . . . , a„, the inequality 

\^\f\ + 0^2/2 + • * • 4 - ^nfnW > 4 - |a2l -f • • * 4 - |a,j|) 

holds. {Roughly speaking, it is impossible for large scalars to combine so as 
to furnish a small vector.) 

Proof: By the homogeneity property of the norm (cf. Definition 1), 
it suffices to show that ||ai/i 4 - a2/2 4- • • • -h a„/„|l cannot come arbitrar- 
ily close to zero when the a’s satisfy the restriction laj 4- lagl 4- • • • 4- 
|a„| = 1 . If the theorem were false, we could choose a sequence of 
vectors gi, g2, g^, . . . , each expressible in the form 


Si = + “n.j/n. = 1 

fc -1 

such that 0 asy — ► 00. Since each of the numbers ,1 is certainly 

< 1 , we can (by the Bolzano-Weierstrass theorem) select a subsequence 
of the g/s for which the corresponding subsequence of the numbers ^ 
converges to a number aj; then we can extract from this subsequence of 
the a further subsequence such that the corresponding subsequence 
of the numbers converges to a number by a finite number of such 
steps (since n is finite!) we obtain a subsequence of the g/s, say hi, h^, h^, 
. . . , where 

= Phifl + PzziA + * * * + firizifny 2 \^k,i\ = ^ 


and ^ Then clearly hi h, where 


h = aj/i 4 - 0^2/2 4 - • • * 4 - <X.nfny 


2W = i. 



2. NORMED LINEAR SPACES SI 


Since the /’s are linearly independent and the a’s do not all vanish 
h^o-, on the other hand, clearly ||/i|| = \\g^\\ = o. and so A = o! 

This contradiction proves the existence of the positive number <5 referred 
to in the statement of the theorem. 


Corollary: Let two norms, || || and || \\\ be defined on a finite^ 
dimensional linear space. Then there exist positive numbers ci and such 
that, for every vector/, the inequalities \\f\\ < ll/|r < 11/1| hold. A 

subset S which is open with respect to one of these norms is also open with 
respect to the other norm. {Thus, all norms are topologically equivalent , in 
the sense that they determine the same collection of open subsets.) 


Proof: Select a basis {f, / 2 , . . . ,/„} of the space, and let any vector 
/ be expressed in the unique form / = a^/i + -f * * • + a„/„. By 
Theorem 4, ||/11 > ^(laj 4- M 4- • * * + |a„|); on the other hand, by 
the triangle inequality we obtain ||/11' < w(|ai| -f la^l 4 * * * 4 |a,J), 
wherem = max{ll/iir, Wf^W', ..., \\fj'}. Hence, \\ f \\' < ^\\ f \\ = 1|/1|. 

The other inequality is now obtained by interchanging the roles of H 1| 
and 11 11' in the preceding argument. Now, if S is open with respect to the 
norm H H and if x e S, then there exists a positive number r such that 
S^x) ^ S, where the norm 1| |1 is employed in defining the open ball 
from the previous results it is evident that, for sufficiently small 
r, S'-{x) c s, where the norm H H' is now employed. Hence, S is also 
open with respect to the norm H H'. We emphasize that this corollary 
fails in infinite-dimensional linear spaces (cf. Exercise 5.) 

Theorem 5: Every finite-dimensional normed linear space is 
complete. 

Proof: Choose any basis /i,/ 2 , • • • ,fn and any Cauchy sequence 
giigz, . . . and express each g^ in the (uniquely determined) form 


gi = + ^2./2 + * * ' + a^/n- 


Given € > 0, there exists N(€) such that whenever / and j exceed N{€), 

* > llfi - ^<11 = I12 Li («*,! ~ ^ 2ik-i !“*.> “ 

step is justified by Theorem 4.) Hence, each of the n sequences of scalars 
••• is Cauchy and therefore convergent to limits a*, 
1 <*<«. It then follows readily from the triangle inequality that 
\\g -g^ii^ 0, where g = a^/i + a, /,+ ••• + «„/„• Thus, any Cauchy 
sequence is, in fact, convergent, and so the normed linear space is com- 
plete. 


Theorem 6: Every finite-dimensional normed linear space is locally 
compact; i.e.,from every bounded sequence of vectors gi,Si’S»< • • ■ d is 



84 NORMED LINEAR SPACES 


possible to choose a convergent subsequence. (Cf. Definition 10-3 and 
Theorem 10-1 of Chapter 1.) 

We leave the proof to the reader as Exercise 2; it is very closely 
related to the proof of the preceding theorem. It should be noticed that 
Theorem 6 is a straightforward extension of the Bolzano-Weierstrass 
theorem; indeed, it reduces to the latter when the normed linear space 
under consideration is one-dimensional. 

Corollary : Any finite-dimensional linear manifold of a normed linear 
space is closed. 

Proof: Any limit-point / of the manifold is expressible as the limit 
of a sequence of distinct vectors contained in the manifold. Since a 
convergent sequence is Cauchy and since the manifold is complete (by 
Theorem 5), the limit-point / must belong to the manifold, and the latter 
is therefore closed. 

Definition 2; A subspace of a normed linear space N is a clo.sed 
linear manifold of N. 

Theorem 7: Let M be a normed linear space, let S he a suhspace 
which is a proper subset of N, and let a be any positive number < 1 . Then 
there exists a vector in N whose norm is 1 and whose distance from S 
exceeds a. 

Proof: Let /be any vector in TV — 5; since S is closed, the distance 
from /to 5 is some positive number d. Hence there exists a vector ^ in -S 
such that 6 < ||/— ^|| < (5/a. Now f — g ^ S (why?), and ||/— ^|| = 
IK/ — g) — o|| ; since o e S, the distance between f — g and S is <^/a 
and >(5. Hence, (/ — ^)/ll/ — g\\ is a vector of norm 1 whose distance 
from S is >^/||/~ ^|| > ^/(<5/a) = a. 

Definition 3: A normed linear space which is complete (with respect 
to the metric induced by the norm) is termed a Banach space. 

Remark: Note that the first part of the Riesz-Fischer theorem 
simply asserts that each is a Banach space, while Theorem 6-1 of 
Chapter 3 asserts the same for each P. 

Theorem 8: The metric in any normed linear space is translation- 
invariant; by this we mean that if /, g, and h are any vectors, then p{f, g) = 
p(/+ ^)- 

Proof: p{f,g) - ll/- ^1| = ||(/+ h)^{g + h)\\ = p(/+ h,g + h). 



2. NORMED LINEAR SfACES «S 


Definition 4: A (non-empty) subset S of a linear space (not 
necessarily normed) is said to he convex if for every pair of vectors {/, 
belonging to S and for every real number cl in the interval [ 0 , 1 ] the vector 
4- (1 — QL)g also belongs to S, (Note that the scalar cl is confined to 
real values, even if the linear space is over the complex field,) 


Theorem 9: Any ball, open or closed, in a normed linear space is 
convex. 

Proof: By the preceding theorem, it suffices to consider a ball 
centered at the vector o. The vectors / and g belong to the closed ball 
Sr[o] iff ll/ll < ll^ll ^ Then, for any number a in the interval [0, 1] 
we obtain |la/ + (1 — a)g|| < Ha/H -f ||(1 — = <x 11 / ll T- (1 a) |1^|| < 

(a 4- [1 — a])r = r, and so a/ + (I ~ a)^ also belongs to S,[o], which is 
thus shown to be convex. A similar proof holds for the open ball 5^(0). 


As indicated in Definition 4, the concept of convexity is independent 
of norm; it depends only on the linear space under consideration. On 
the other hand, a particular ball, Sfx) or will depend not only on 
r and x, but also on the particular norm that is defined on the space; 
however, according to Theorem 9, the ball will be convex for any choice 


of the norm. 

It is instructive to consider in detail a very simple particular example. 
Let L be the two-dimensional real linear space consisting of all ordered 
pairs of real numbers, with the usual definition of addition of vectors and 
multiplication by scalars. We can visualize L as the familiar plane of 
analytic geometry, but to begin with only the vector-space operations are 
defined; no concept of distance is involved at first. Then ior each choice 
of the number 1 < /> < + oo, let us norm L by f 
« = (ai, a,), to be (la^r + understanding that for 

n = + 00 the norm of a is taken to be max The accompanying 

Figure 2 shows the open and closed unit balls, 5i(o) and S,[o], P ’ 

p = 2, and p = + 00 , the open ball consisting in each case of he region 
Llosed by the correspondingly labeled curve while 
consists of the open ball together with the curve, horp = 1 andp - + o 
the curve is a square, while for p = 2 the curve is a crcle ^-n^each case 
we observe that the ball (open or close ) is convex, ^ ^ 

Theorem 9. I. is easily seen lhal, asp .ncreases 

ball (open or dosed) .’“n'.nfteilTcoo.es, bo. Theorem 9, 

methods of elementary calculus that the 

together with the results of §3-6, assure this resu . preceding 

It is instructive to observe what happens intm^^^^^ 

definition of |lal|„ but allow p to assume ^ jhe innermost 

In particular, for p = | we obtain the region bounded by 

curve in Figure 2; this region is obviously not convex. A similar 



$6 NORMEO LINEAR SPACES 



obtained for any other choice of p in (0, 1), and so it follows that the 
restriction > 1 is a necessary, as well as sufficient, condition that the 
defined '‘/?-norm” should actually be a norm. (Cf. Exercise 3.) 


Exercises 

1 . Prove Theorem 1 . 

2. Prove Theorem 6. 

3. Find two ordered pairs of real numbers, a = (xj, Xg) and b = 

such that Ik 4 6|U > + l|/?||p for every p 

satisfying 0 < /? < 1. 

4. Show that for any p satisfying the inequalities 1 < /? < + oo the 

subset of consisting of sequences containing only a finite 
number of non-zero entries is a normed linear space but not a 
Banach space. 

5. Show by means of a specific example that the corollary to Theorem 

4 fails in infinite-dimensional linear spaces. (Hint: Make use 
of Exercise 2-3 of Chapter 1.) 



3- INNER-PaODUCT SPACES 


87 


§3. INNER-PRODUCT SPACES 

The reader will recall from the elements of euclidean vector analysis 
the formula ^ 

cos 0 = ^2/^2 + 

Ikll • 11^11 

where 0 denotes the angle between the non-zero vectors a and b whose 
cartesian components are a^, ag, ag and /^ 2 , /?,, respectively; of course, 
\\a\\ and \\b\\ are the lengths of the vectors a and b, respectively. The 
definition of a normed linear space does not permit the introduction, in 
any natural manner, of the concept of the angle between two vectors. 
We now introduce the inner-product spaces, which admit, as will be seen, 
the concept of angle between any two non-zero vectors, at least in the 
case when the real field of scalars is employed. Actually, the concept of 
angle is itself of rather minor significance; it is the concept of orthog- 
onality, or perpendicularity, which is really significant, and this can be 
defined whether the real or complex scalars are employed. 

Definition I : A complex'^ inner-product space is a complex linear 
space L together w ith a complex-valued function ^ denoted ( , ) and called 
the inner product, defined on LX L and having the following properties: 

(0 (/,/) > 0 '// # {o, o) = 0; 

{'■'■) (/. g) = vectors/, 

(Hi) (/ + g, h) = if, h) + (g, h) for all vectors/, g, h; 

(iv) (a/, g) = a(/, g) for all vectors /, g and scalars a. 

In Exercise 1 we list a number of simple but important consequences 
of this definition, which will be used frequently without comment. 

Examples 

(a) The set of all ordered ^-tuples of scalarsf with (a, b) = oci^i 4- 
^ 2^2 4 • • • + where a — (ocj, ag, . . . , ocjandb == (fiu • • • » 

(b) The set of all continuous complex- valued functions defined on a 

compact interval [a, b], with (f.g) = Ja/^- 

(c) L\A), with {fg) defined as in (b). (The Holder inequality 
shows that Jyg makes sense when f and g belong to L^{A).) As in previous 


* For convenience we specifically formulate our definition for complex spaces; 
with a few changes, almost all quite obvious, the remaining matena ® ^ 

applies equally well to real spaces. A bar over a complex number eno es J S 

of that number, so no confusion with the previous use of the bar to signify closure of a 

set can henceforth from stating the obvious definitions of addition and 
multiplication by scalars. 



88 NORMED LINEAR SPACES 


discussions involving integration, A denotes any subset of R possessing 
positive measure (+00 permitted). 

Theorem I (Schwarz inequality): For any two vectors f and g, 
\(f>SW < ^qitality holding iff f and g are linearly dependent. 

Proof: The assertion is obvious if either/ or g is the vector o, for 
then both sides of the inequality reduce to zero. We may therefore assume 
that f 0 , g ^ 0 . Consider (/ — ^g,f — ^g) as a function of the real 
variable A. By (/) of Definition 1, this expression is always non-negative; 
since (/- Ag) = (/,/) - 2A Re (/, g) + g). we conclude 

that the discriminant of the quadratic function appearing on the right is 
non-positive (and real). Thus, (Re (/,^)}^ < {fj'){g,g)^ If ifg) > 0, 
then Re (fg) = \{f^g)\^ we have the desired result. Otherwise, we 
can choose the scalar a such that |a| = 1 and a( / ^) > 0; replacing/ in 
the preceding argument by a/, we obtain {Re (a/, g)Y^ < ( 7 /, a/)(g, g) — 
^^(ff)(g> g) = (fj')ig^ g)y and hence 1 ( 7 / g)\^ < {fj)(g. g). Since 
\(^fg)\^= M^\(fg)\‘^= I (/,^)|/ we again obtain \(fg)\^< (ff)ig^g)- 

If we trace through the steps of the argument, we easily find that 
equality holds iff f — Ig = o for some choice of A (not necessarily real, 
because of the use of the scalar a in the later stages of the argument). 
Taking account of this observation and recalling the trivial cases that 
were set aside at the beginning of the proof, we obtain the final assertion 
of the theorem. 

We give a second proof, which the reader may find instructive: As 
before we may assume that g 9^ o, so that (^, g) > 0. We may then form 
the vector h =f ^ (if g)Kg^ g))g’ ^ simple calculation furnishes the 
equality (h, h) = (/, /) -- \{fyg)\^l{g^g)\ since (/?, h) > 0, we conclude 
that \(f,g)\^ < (.f,f){g,g), equality holding iff h = o, or f= kg, where 

^ = U'>g)lig^g)- 

Theorem 2 : For any two vectors f and g, if -f- g^f + g)^^^ K 
(J'^fyi^ + {g^gY^^^ equality holding iff f and g are positive multiples of 
each other {excluding the trivial case that either f or g is the vector o). 

Proof: Using Theorem 1, we obtain the following chain of equalities 
and inequalities: 

(/ + ^./ + ^) = (/'/) + (/. g) + (g’f) + (g’ g) 

= (/,/) + 2 Rc (/, g) + (g,g) 

<(/./) + 2 !(/, g)i + (g, g) 

<(/,/) + 2 (/,fy^^-(g,gy/^ + (g,g) 

Taking the square root of the initial and final expressions, we obtain the 



3. INNER-PRODUCT SPACES 


89 


desired result. Equality is seen to hold iff Re (/, = |(/.j^)| = (/ ry/t 

(g, and these two equalities lead to the condition stated in the 
theorem. 

It is now clear that (/,/)»/2 satisfies all the conditions imposed upon 
a norm, and so we use the notation Hyn. Thus, every inner-product 
space becomes a normed linear space, and hence a metric space, with the 
understanding that \\f\\ = and p(/, g) = \\f - g||. We’note that 

in a real inner-product space the Schwarz inequality assumes the form 
-I < (/.,?)/(ll/ll • < 1. so that, exactly as in euclidean vector 

analysis, we can define the angle between any two non-zero vectors, as 
asserted in the first paragraph of this section. 

Theorem 3: {a) For any vectors f and g, H/'-f gji*- -f 1| / — g||2 -- 

2\\JT-\-2\\g\\\ " ‘ 

(/)) For any vectors f and g. (/, g) = j{i| /' -f gP + / 1| / 4. /gp 

Proof: Left as Exercise 2. 

Theorem 4: The inner product {oiJ]g) depends continuously on the 
scalar a and the vectors /, g — that is\ given e > 0 and on,/, g, there exists 
a positive number 6 {depending on e, a,/, and g) such that | (/)/, g) — (a/, g)| < 
e whenever - a| < d, ||/-/li < d, < b. 

Proof: Left as Exercise 3. 


Definition 2: The vectors f and g are said to he orthogonal if 
(f^g) = 0. {Clearly {f,g) = 0 iff (gff) = 0, so that orthogonality is a 
symmetric relation. Also, 0 is the only vector which is orthogonal to itselj.) 

Theorem 5: Let A he any non-empty collection oj vectors. Then 
the set of all vectors orthogonal to every vector in A is a subspace, called 
the orthogonal complement of A and denoted A ‘ . 


Proof: Suppose g and h are both orthogonal to/. Then 

(/, Oig + ^h) = a(/, g) -t- /?(/, /i) = 0 + 0 = 0; 

hence a.g -|- fih is orthogonal to /, and so is certainly a linear manifold. 
If g is a limit-point of /I we can find a sequence 2 , ... m >4 - whi^ 
converges to g. For every vector /in A, (/, g) — (/> gn) + (/> S ~ 

(/> g ~ gn), since (/, g„) = 0. Thus, |(/. ^)l = K/’ g ~ 

11^ - f„l|. Letting n oo and observing that (/,g) is independent of n, 
we conclude that (/, g) = 0, so that g e A^. Thus A - is closed, and is 
therefore a subspace. 



90 NORMEO LINEAR SPACES 


Definition 3; A collection of vectors is said to be orthogonal if every 
two distinct members of the collection are orthogonal. An orthogonal 
collection of vectors each having unit norm is called orthonormal. (A vector 
having unit norm is often termed a unit vector.) 

Theorem 6: Any orthonormal collection of vectors is linearly 
independent. 

Proof: If /i,/2, > > • ,fn constitute any finite set of vectors from this 
collection and if ai/j -f o (. 2f2 + • • * -f a„/„ = o, then 

0 = Iki/i + (X. 2 f 2 + • • • + CinfnP 

n 

“h ^2/2 “h ^nfn^ ^\f\ ^2^2 "h ^nfn) ^ • 

k^l 

(The last equality is based directly on the orthonormality of the /’s.) 
Thus, ai = ag = •••== a„ = 0 , and so the /'s must be linearly inde- 
pendent. 

Theorem 7; <tre orthonormal and if g is expressible 

as a linear combination of thefs, then the combination must he (g,fi)fi + 

Proof: By assumption, there exist scalars aj, ag, . . . , a„ such that 
g = oiifi -f a2/2 + • • • 4- a„/„. For any index /c, 1 </:<«, we obtain 

igjk) = (“l/l + «-iU+-- - + a-nfnjk) - <»-k(jkJk) + lii^k^AfiJk) = a*, 
since = 0 if yt. 

Theorem 8; Let /i,/ 2 , • . */n he any linearly independent finite 
collection of vectors. Then there exists an orthonormal collection hi, hzy 
. . . , h^ which spans the same linear manifold as that spanned by the f's. 

Proof: We merely sketch the principal idea, leaving the detailed 
proof as Exercise 4 . Let^^i = fi, ^2 = /a + aia/i, ^3 = /g -f- aga/a + 
and so forth, where the a’s are so chosen that the g's form a collection of 
orthogonal non-zero vectors. (The fact that such a choice of a’s is 
possible [and in a unique manner] rests on the linear independence of 
the /’s and is the essential point in a rigorous proof.) Then, clearly, any 
vector expressible as a linear combination of the /’s is also expressible as 
a linear combination of g’s, and conversely. By setting h^ = (l/HgklD^k 
we obtain the desired orthonormal collection. 

The procedure just described for obtaining an orthonormal collection 
of vectors spanning the same linear manifold as that spanned by a given 
(finite) collection of vectors is termed the Gram-Schmidt process. It 



3. INNER-FRODUCT SPACES SI 

obviously extends to a countably infinite linearly independent collection 
of vectors. 

Theorem 9: Let hi^ . y be a finite orthonormal collection 
of vectors and let f be any vector. Theny among all choices of the scalars 
ai, ag, . . . , a„, the (non-negative real) quantity 

II/— (aj/ij + ag/ig 4* * * • 

is minimized by the unique choice a*, = (/, \ <,ny and the minimum 

of the preceding expression is ||/P — |(/, h„)\^. 

Proof: Let a^. = (f hf) + elementary computation (which 

must, of course, take account of the orthonormality of the /I’s), iurnishes 
the equality |1/- + • * • -f a„/i JP = |l/p ~ |(/, /^l* -h 

The right side is clearly minimized by choosing == st= 

* • • = i^n = and any other choice will increase the right side. Hence, 
the theorem is proved. 

Corollary: Let 331 be any finite-dimensional linear subspace in an 
inner-product space y and let f be any vector in the space. Then there exists 
a unique vector g in 3Jl which minimizes \\f — among all vectors g in 931; 
the vector g is such that f — g 6 331-^ . 

Proof: Construct an orthonormal basis /?i, /^g, . • • » h^ in 331, apply 
Theorem 9, and observe that the vector 

/ ~ {(/, h)h + (/, h)h2 +"' + (/ K)K) 

is orthogonal to each hj^, 1 ^ k ^ n. (The reader will find it helpful to 
look upon this corollary as a generalization of the euclidean theorem 
which asserts that the shortest distance from a point to a line is furnished 
by the perpendicular.) 

Corollary (Bessel’s Inequality): Let /?i, //g, • • any ortho- 
normal collection of vectors, finite or countably infinite. Then for any 
vector f the inequality J i(/, < ||/P holds. (If the h^s 

countably infinite collection, the convergence of the infinite series 2 l(/ *)! 
(to a finite sum) is part of the conclusion, not an added hypothesis.) 

Proof: Referring to Theorem 9 and exploiting the fact that 
11/- + • • • + oiMV > 0 for all choices of the a’s we 

conclude that I(/. ^*)1® < UV for every positive integer n. Letting 



92 NORMED LINEAR SPACES 


n increase without bound, we conclude that the series lr=ii(/»w 
converges, and that the sum is < ||/P, when the collection of h's is 
countably infinite. (Cf. Exercise 6.) 

This corollary demonstrates, in particular, that the limiting relation 
lim„-,oo (/, hj = 0 must hold when the /I’s constitute a countably infinite 
collection. 

Exercises 

1. (a) Prove that (o, /i) = (/i, o) = 0 for every vector h, so that the 

condition (o, o) == 0 appearing in Definition 1 is superfluous. 

(b) Prove that (/, <xg) = a(/, g) for all vectors /, g and scalars a. 

(c) Prove that (/, g + /i) = (/, g) -f (/, h) for all vectors /, g, h. 

2. Prove Theorem 3. Why is part {a) called the parallelogram law? 

3. Prove Theorem 4. 

4. Work out the detailed proof of Theorem 8. 

5. Prove that the finite collection {/i,/2, . . . ,/„} is linearly independ- 

ent iff the expression 

n 

is positive definite — i.e., positive for all choices of the scalars 
aj, aa, . . . , a„ except the trivial one aj = ag = * * • = a„ = 0. 

6. Suppose that the collection is orthonormal and that the 

index-set I is uncountably infinite. Prove that, given any 
vector /, the inequality (/, /i,) 0 holds only for a finite or 

countably infinite set of indices iel and that 
converges to a sum < WfV, where the summation is performed 
only over those indices for which (/, /?,) 5*^ 0. 

7. Give an example of an inner-product space which contains an 

uncountably infinite orthonormal collection of vectors. 


§4. HILBERT SPACES 

Definition I: A Hilbert space is an inner-product space which is 
complete {in the norm induced by the inner product), {Frequently this term 
is restricted to infinite-dimensional spaces, and sometimes the additional 
restriction of separability is imposed,) 

Examples 

(a) The Riesz-Fischer theorem guarantees the completeness of 
L\A), the inner product being that defined in Example (c) following 
Definition 3-1. 



4. HILBERT SPACES *J 


ib) The results of §3-6 and §3-7 show that P becomes a separable 
infinite-dimensional Hilbert space when the inner product of the vectors 
^ = (aj, a 2 , ag, . . .) and h = • • •) is defined as 

Henceforth, it is always understood that P is a Hilbert space with the 
inner product as just defined. 

(c) From Exercise 2—4 it is apparent that the subset of consisting 
of all sequences containing only a finite number of non-zero terms is an 
infinite-dimensional incomplete inner-product space. 

Theorem I : Let S be a non-empty closed convex subset of a Hilbert 
space H and let f be any vector in H. Then there exists a vector g in S such 
(hat II/— g\\ < 11/— h\\ whenever h is any member of S other than g, 
{From the very statement of the theorem it is clear that g is unique.) 

Proof: By translation-invariance (cf. Theorem 2 -8) of the metric 
and the trivial fact that convexity is preserved under translation, we may 
confine attention to the particular case that / = o, so that the theorem 
may be restated in the simpler form: Any non-empty closed convex subset 
of a Hilbert space contains a unique vector of smallest norm. Let b denote 
the distance from the set S to the vector o (i.e., b = inf^^^^v llj^ll). Then 
we can choose a sequence of members of S such that 

ll^„ll (). By the parallelogram law, 

Since + gj (= + (1 - \)gj e. 5, the second term on the left 

is Hence, 


^ gn "F 
2 + 2 



Letting m and n increase without bound, we observe that the right side 
of the last inequality approaches zero; since the left side assumes only 
non-negative values, it must also approach zero. Hence the sequence 
^ 3 » • • • is Cauchy; since the space H is complete, this sequence must 
converge to a vector g in H, and since S is closed, g ^ S. We know that, 
given any convergent sequence in a normed linear space, the limit of the 
norms exists and equals the norm of this limit. Therefore, Ugll — 
thus, there exists a vector g in S such that HgH ^ \\h\\, where h is any 
vector in S. However, if h is any vector in S other than g, \\h\\ must 
exceed 6, for if \\h\\ were also equal to d the parallelogram law would 
furnish the equality 


'= i llgf + i \\hf 





94 NORMED LINEAR SPACES 


and hence \\i(g 4- ^)ll < d. Taking account of the convexity of S, we 
see that this would imply the existence of a member of S whose distance 
from o is strictly less than d, contradicting the definition of d. Thus, the 
proof is complete. 

Remarks: (a) The reader may find it helpful to make a drawing 
which illustrates (in the plane) the geometrical interpretation of the 
preceding arguments. 

(6) The full force of the convexity assumption has not been used. 
We have only employed the particular case that a = 1 — a = How- 
ever, the assumption that S is closed cannot be omitted. (Cf. Exercise 1.) 

(c) It is reasonable to ask whether Theorem 1 continues to hold in an 
arbitrary Banach space. That the answer is negative is easily seen by 
reference to Figure 2 and the accompanying discussion. If the 1°^ — norm 
is employed, it is evident that the distance from the vector (2, 0) to the 
closed unit ball is equal to unity, and that all points on the right edge of 
the ball (i.e., all vectors of the form (1, a), —I < a < 1) are at unit 
distance from the given vector. Thus, the uniqueness part of Theorem 1 
certainly fails in Banach spaces, while a more delicate example (in an 
infinite-dimensional Banach space) shows that the existence part of the 
theorem also fails. The reader should find the search for such an example 
highly instructive. 

Corollary (Projection Theorem): Let S be a suhspace of a 
Hilbert space H. Then every vector f in H can be expressed, in a unique 
manner, in the form / = ^ where g ^ S and he S^. 

Proof: S is non-empty, closed, and convex, and so the preceding 
theorem guarantees the existence of a unique vector ^ in 5 such that 
11/ < 11/ "" 1^11 every vector g in S other than g. Let gj be any 

vector in 5 and let € be any real number. Then g -f egi g S, and so 

ll/-^ll< ll/-(g+egOII, or ||/-gp< l|(/-g)^eg,p. 

Expanding out the right side (as an inner product) and recalling 
the restriction of e to real values, we obtain 11/ — gP < II/— gP — 
2e Re (/ - g, gi) + e^igi, gi), or e{e{gi, gi) - 2 Re {f - g, g,)} > 0. 
Now suppose that Re (/ — g, gi) > 0. Then, by continuity, the quantity 
in { } is negative for e positive and sufficiently small. Thus, the left side 
would become, for such a choice of c, the product of a positive and a 
negative factor, contradicting the inequality. Therefore, we have elimi- 
nated the possibility that Re (/ — g, gi) > 0; similarly, Re (/— g, gi) 
cannot be negative, and so Re (/ — g, gi) = 0. Since gi may be replaced 
by igu we obtain Re (/ — g, igi) = 0, or Im (/ — g, gi) = 0, and hence 
(/ — g, gi) = 0. (If H is a space over the real field, the latter argument 



4. HILBERT SRACBS n 


is simply omitted.) Thus,/ — g is orthogonal to every vector belonging 
to S; setting/ — g equal to /i, we have the desired representation of/. 

We conclude this section by discussing in some detail a simple, but 
not utterly trivial, pioblem closely related to Theorem 1. In the following 
chapter we shall discuss a number of significant extensions of this problem 

Let N be the finite-dimensional complex inner-product space con- 
sisting of all polynomials of degree <100, restricted to the interval 
[-1,1], with inner product (J\ g) = j‘i^/^. (Needless to say, the interval 
[— 1,1] and the number 100 can be replaced by any bounded interval and 
any positive integer, respectively.) We know by Theorem 2~5 that H is 
complete, for the functions 1, .v, . . . , constitute a basis. Let a be 

any specified point of [-1, 1] and let S be the set of all members of H 
satisfying /(a) = 1. Clearly, S is non-empty (it contains the function 
/s 1) and convex (for if/(a) = g(a) = 1, then 0 Lf{a) -f (1 - &)g{a) = 1, 
even without the restriction that a be real and 0 < a < 1). 

We now proceed to establish the remaining hypothesis of Theorem 1 , 
namely, that S is closed. Suppose, therefore, that ... is any 

Cauchy sequence in S. Since H is complete we know that this sequence 
must converge to some vector /—i.e., there exists a vector /such that 
11 / /tJI If remains to show that / belongs to S. Consider the 

collection of functions {(.v — a), (x — af, .... (a — 1}; they 

obviously constitute a basis of //, and if we carry out the Gram-Schmidt 
process on these functions (in the indicated order) we obtain an ortho- 
normal basis {giygi, . . . Since all functions in the original 

basis, except the last, vanish at a, it is clear, from the nature of the 
Gram-Schmidt process, thatgi(a) = gz^a) = • • • = g,oo(^) = 0. Further- 
more, ^ioi(a) 9 ^ 0, for otherwise every linear combination of the 
and hence every member of //, would vanish at a, which is not true (as we 
have already seen). Any function g m H must possess an expansion 


/ \ 

S(x) = ( 2 ^kSkMj + aioigioi(^)> 


and when we set a: = a we obtain g(a) = aioi^ioi(^)» 

Thus, Wgr = IS + l?(")/?ioi(«)l“ > \g(a)lgmM\^- 
^yf — fn, we obtain 




\f(a) -fuiaf ^ l/(«) - A' . 

|gioi(«)l“ lgioi(")l'‘ 


= g(a)lgioAa)' 

Replacing g 


Thus, |/(a)- ll<|f,„i(a)|-|l/-/nll: letling " ’ " 

/(a) = 1, and so/eS. Thus, Sis indeed closed. 

We are now assured that 5 contains a uniquely determine ^ 

smallest norm; that is, we have proved the following resu t. ere exis 
a polynomial of degree <100, assuming the value unity at x «, w 



96 NORMED LINEAR SPACES 


possesses smaller norm than any other such function. (Cf. Exercise 2.) 
Ill fact, from the preceding arguments we see that this extremal function 
is given by 

fix) = 

g.oi(a) 


and that its norm is given by ||/P = l/|glol(fl)|^ or ||/11 = l/|j?iox(a)|. 

We can, in fad, give a much more concise solution of this problem 
as follows. Let {//j, //g, . . . , /?ioi} be any orthonormal basis of H. Then 
every member / of // can be expressed in the form / = ll/P is 

given by and the condition /(a)=l assumes the form 

S-A ~ Thus, the problem of minimizing ||/1|, subject to the 

restrictions f{a) = 1 and f e H, is equivalent to the purely algebraic 
problem of minimizing subject to the constraint = 1 . 

Referring to Example (a) following Definition 3-1 (with n = 101) and to 
the Schwarz inequality, we define the vectors /) = (aj, ag, . . . , ajoi) and 
c = {h^{a), h^ia), . . . , and we obtain 1 = {h, c) = |(6, c)l < 

||/;|| • ||c||, and hence \\h\\ > l/||c|l, equality holding iff /? = 2c for some 
scalar 2. We determine 2 by observing that 1 = (h, c) = (2c, c) = 2(c, c), 
or 2 = l/(c, c). Hence, the optimum choice of b is given by Z? = c/(c, c), 
and this choice of h (and no other choice) furnishes the result ||^|1 = l/|lcl|. 
Returning to the space H, we see that the optimum function (i.e., the 
polynomial of degree <100 satisfying the condition f(a) = 1 and 
possessing the smallest possible norm) is given by 




101 

2 hkia)h^{x) 

k^l 

101 


I \h{a)\^ 


Furthermore, the minimum norm is given by 

( 101 \-l/2 

We emphasize that the final pair of formulas is valid for every choice 
of the orthonormal basis hi, , /?ioi- Since /is uniquely determined, 

we obtain the remarkable result that for any two orthonormal bases of H, 
say hi, Aa, . . . , /iioi and gi,g^,,,., gi^i, the identity 


101 101 

2 h„(a)h„ix) = 2 gs:(a)gfcW 

/c=»»l Ar=l 

must hold. Each side of this identity therefore defines a function K{x, a) 
which is unambiguously determined by the Hilbert space H which has 



5. ORTHONORMAL BASES IN HILBERT SPACES ff 


been under discussion. Finally, we remark that, for every function h 
belonging to H and for every point a in the interval [—1, 1], the equality 

h(a) = j* h{x)K{x, a) dx = J h{x)K(a, x)dx 

must hold. (Cf. Exercise 3.) 

Exercises 

1. (a) Show that in Theorem 1 neither the assumption of convexity 

nor the assumption of closedness can be omitted. 

(/?) A subset 5* of a linear space is said to be mid-point convex if for 
every pair of vectors {/, belonging to S the vector ^(J' + g) 
also belongs to S. Give a simple example of a linear space 
and a subset which is mid-point convex but not convex. 

(c) Show that a closed subset of a normed linear space which is 
mid-point convex is, in fact, convex. 

2. Let H be the inner-product space consisting of all polynomials, 

restricted to the interval [-1,1], with the usual inner product. 
Show that H is not complete, and that the problem of mini- 
mizing ll/ll subject to the restriction /(a) = 1 does not have 
a solution. 

3. Prove the last equation of this section. 

4. (a) Carry out the Gram-Schmidt procedure, on the interval 

[— 1 , 1], on the set of functions (1 , .v, x^, x'*} (in the indicated 

order). . 

{b) Carry out the same procedure on the same set of functions, but 

in reverse order. 


§5. ORTHONORMAL BASES IN HILBERT SPACES 

In this section we consider a fixed Hilbert space H, 
to be separable and infinite-dimensional, correspond. ngthemy^ 

a finite-dimensional space will follow quite trivia y ro ^ 

while the extension of the theory to non-separable ^ 

of logical niceties with which we do not countably’ infinite 

Our first objective is to show that there ex s^.^^^ 

orthonormal collection of ’r linear combination 

in H can be approximated arbitrarily do y y ^ 

of these vectors. By hypothesis, we can ^ sequence 

subset of vectors, and we can arrange tr o (for other- 

; for co„v»i»«. W. 

Wise we may simply interchange gi an 

of the g’s in the following manner: We begin wu gi. 



n NORMID LINEAR SPACES 


or remove ^ 2 , according as the pair {gu gt) is linearly independent or 
linearly dependent. Then we proceed inductively, retaining g„if it is not 
expressible as a linear combination of the g's which have previously been 
retained and rejecting otherwise. The vectors which arc retained are 
denoted gi, g^, gs* • • • • We shall sec later that there are infinitely many 
g’s, but for the present we proceed as though this fact is not known. It 
is evident, from the very manner in which the g’s are selected, that they 
constitute a linearly independent collection of vectors, and that a vector 
in /f is expressible as a (finite) linear combination of g’s iff it is expressible 
as a linear combination of g’s. We now perform the Gram-Schmidt 
procedure on theg’s, obtaining an orthonormal collection {hi, A 3 , . . .}. 
Again, by the nature of the Gram-Schmidt procedure, it is clear that 
every linear combination of the g’s is a linear combination of the A’s, and 
conversely. 

Now we proceed to show that there exist infinitely many /I’s. Suppose 
that there were only a finite number of A’s, say hi, h^, , , . , h^. Given 
any vector f inH and any positive number €, there exists some vector gj^ 
such that 11/ — g;fc|l < €, since the g’s form a dense subset of H. Since each 
g is expressible as a linear combination of h's, it is certainly true that g* 
can be expressed in the form 2^”.^ (gj^, h^hi (by Theorem 3-7). Hence, 
11 / — 2 ”-! ^ some choice of the a’s, and it now follows from 

Theorem 3-9 that ||/ — (/> A<)AJ| < e. Since n is fixed and c is 

arbitrarily small, it follows that / = (/, A^)/i^. Since /is an arbitrary 

vector in H, it then follows that H is n-dimensional, contradicting our 
assumption that /f is infinite-dimensional. Thus, the vectors hi, h^, h^, . . . 
constitute a countably infinite orthonormal collection. 

Now, by choosing a sequence of positive numbers € 1 , € 3 , € 3 , . . . 
converging to zero, we conclude, by an obvious extension of the reasoning 
employed in the previous paragraph, that lim^_^^jjj ||/ — (/> A,)/i,|| = 

0, or lim„^o 5 j (/» A<)A< = / From this, in turn, we obtain the equality 
11/ II* * 1 1(/> A<)|*, which is known as ParsevaVs relation and should 
be compared carefully with Bessel’s inequality. 

While it is not true that every vector in H can be expressed as a 
finite linear combination of the A’s, we nevertheless refer to the collection 
of A’s as a basis (more accurately, an orthonormal basis) of H; we thus 
contradict the definition of a basis given in § 1 , but this minor offense 
causes no difficulty. 

If in the equality H/H* = 1 1(/» A,)|* we replace /in turn by/ + g, 

/ "" S*f + ^ ^8 employ part (b) of Theorem 3-3, we obtain the 

following equality, which is also known as ParsevaVs relation: (f, g) = 

(/» A<)(g, hi), (Of course, the second form of this relation reduces 
to the first when g is replaced by /.) It should be evident (we dispense 
with a formal proof) that an orthonormal collection of vectors [hi. A*, . . .} 
constitutes a basis of H iff the Parseval relation holds for every vector (or 
pair of vectors) in H, and that any orthonormal collection of vectors 



5. ORTHONORMAL BASES IN HILBERT SPACES ft 


which fail to constitute a basis can be enlarged to a basis. Also an 
orthonormal collection of vectors constitutes a basis iff o is the only 
vector orthogonal to each member of the collection. ^ 

It is interesting to note that the completeness of H has not been used 
up to this point. However, the hypothesis of completeness is needed in 
order to prove the following important result, which may be considered 
as a converse of the Parseval relation. 

Theorem I . Let the vectors Aj, Aj, A3, . . . constitute an orthonormal 

basis in H and let the scalars a,, aj, a be given. Then there exists a 

vector f such that (/, AJ = a^/or all indices i iff the series 2,? , |a,|* con- 
verges {to a finite sum); when this condition is satisfied the vector f is 
uniquely determined and is given by the relation 

CO n 

/ = 2 = ■in’ 2 • 

1=1 n-»ooi«>l 

Proof: If such a vector / exists, the convergence of the series 
l(/» is assured by Bessel’s inequality. Conversely, 
if the series laj* converges, let the vectors /„ be defined in the 
obvious manner: /„ = I < /i < oo. Then whenever n>miht 

equality l|/„ = 2r=m+i holds, and the assumed convergence 

of the series guarantees that the sequence fufz.fz, ... is Cauchy. 

Since H is complete, there exists a vector / such that \\f — /„|1 -->0. 
Given any index k, we obtain for any index n greater than k the equality 

if, K) = (/n. + if-fn, K) = 0-iiK K) + if-fi, K) = «» + 

if-fn,fiH), or (/, AJ - a* =(/-/„, AJ. Employing the Schwarz 
inequality and letting n increase without bound, we obtain !(/, //J — a^l < 
11/ -/nil * Pifcll = ll/-/nll --^0,andso(/, /Zfc) = Finally, the unique- 
ness of / is easily established as follows. If g is any vector satisfying the 
prescribed conditions, then (/ — gy hj,) = 0 f^or all indices ky and from 
the Parseval relation we obtain \\f— = 0, or/= g. 

We now observe that this theorem may be restated in the following 
manner: There exists a one-to-one correspondence between // and P, 
the vector f inH corresponding to the vector (a, , a^, a^, . . .) in /*, where 
oLf = (/, hi). Furthermore, this correspondence obviously preserves the 
vector-space operations in H and and also preserves inner products, for 
if the vectors f and g in H correspond to the vectors (ai, a2, ag, . . .) and 
(i^i» /^ 2 f • • .) in respectively, then (/,g) = 'IZi we 

have established an isomorphism between H and /*, and from this it 
follows readily that any two separable infinite-dimensional Hilbert 
spaces are isomorphic, so that, in a sense, there is only one such Hilbert 
space. It should be emphasized, however, that the correspondent 
between two separable infinite-dimensional Hilbert spaces is not unique y 
determined, for there is wide freedom in the selection of an orthonormal 
basis in each space. 



100 NORMED LINEAR SPACES 


The preceding result, despite its elegance, may be misleading, for in 
many practical problems one may find it necessary to work with a specific 
Hilbert space, and, while the existence of an unlimited number of ortho- 
normal bases is assured, one may wish to find a particularly useful basis. 
We discuss this problem briefly f^or three particular spaces. 

(a) In the most obvious orthonormal basis, and the most convenient 
for use in many problems, is furnished by the vectors (1, 0, 0, 0, . . .), 
(0, 1,0,0,...), (0,0, 1,0, 

(b) Let A denote the interval [—1, 1], and let h^, . . . be the 

functions obtained by performing the Gram-Schmidt process on the 
functions 1 , x, . . . . (Cf. Exercise 4-4.) In this manner we 

evidently obtain a sequence of polynomials, the polynomial being of 
degree « — 1. Now, from Theorem 7-2 of Chapter 3 we know that, 
given any function /belonging to L^(A) and any positive number e, there 
exists a polynomial p such that \\f — p\\^ < €. From the fact that the 
collection {hi, h^, h ^, . . .} contains exactly one polynomial of each degree 
(including degree zero) it is evident that p can be expressed as a linear 
combination of the /z's; in fact, if p is of degree k it is clear that p is 
expressible as a combination of the first k 4- 1 /I’s. Hence, the linear 
combinations of the /z's form a dense subset of L\A), and, therefore, the 
h's form an orthonormal basis of L\A), 

(c) The functions 1, form an ortho- 

normal collection on the interval [0, 1]. Again referring to Theorem 7-2 
of Chapter 3, we can, given any function /belonging to L^([0, 1]) and 
any positive number c, find a continuous function g on this interval such 
that II/— ^lla < 6/3; by modifying g appropriately at one end of the 
interval, if necessary, we can obtain a continuous function g such that 
1(0) = g(l) and \\g — gWz < finally, as shown in Appendix D, 
there exists a (finite) linear combination, which we denote as t, of the 
given functions, such that |^(x) — /(x)| < e/3 on [0, 1]. (The symbol t 
is employed to suggest “trigonometric,” since functions of this form are 
known as trigonometric polynomials.) Hence, llg -- = JJ || — /|2 < 

(e/3)^, or llg — t II 2 < €/3. By the triangle inequality we obtain ||/ — t II 2 < 
€, and from this result we conclude, exactly as in {b), that the given 
collection of functions form an orthonormal basis in L^([0, 1]). Since this 
collection of functions was the first orthonormal collection to be thor- 
oughly studied, beginning with Fourier, the quantities (/, hj^ appearing 
in the representation of a vector / in any Hilbert space as a series in 
the orthonormal vectors hi, h^, h^, , . , are frequently called Fourier 
coefficients^ 

It should be emphasized, in connection with {b) and (c), that there is 
no assurance that the series expansions of a given function in terms of a 
particular orthonormal collection converge pointwise; only convergence 
in the L®-norm is guaranteed. We shall return briefly to this topic in §5-2. 



5. ORTHONORMAL BASES IN HILBERT SPACES 101 


Exercises 

1. Prove the projection theorem (in a separable Hilbert space) by 

introducing an orlhonormal basis in the subspace upon which 
the projection is performed. 

2. Let /?i(x) = I, p 2 (x) = {dldx)(x- - 1), p^(x) = dHx'^ - \y^ldx\ 

Prove that (= = (p,,/),)) equals zero 

if i ^ j. (From this it follows easily that these polynomials are 
constant multiples of the polynomials described in the dis- 
cussion of 1, 1]).) Show that p,(l) 9 ^ 0 for each index 

i\ so that the polynomials Pi(x)lp^(\) also constitute an 
orthogonal collection over the interval [—1, 1]. (These are 
the Legendre polynomials; it is standard practice to index 
them according to their degree, so that Pq(x) = pi{x)lpi(\), 

W = /^2W//?2(1), ) 

3. Let /(x) = X — i in the interval [0, 1]. Determine the Fourier 

coefficients of this function with respect to the orthonormal 
basis discussed in (c), and then, by invoking the Parseval 
relation, evaluate the sum of the senes l/A'^ 



CHAPTER 5 


LINEAR FUNCTIONALS 


The major part of the theory of linear spaces consists of the study of 
mappings of one such space into another, particularly linear mappings. 
In this chapter we shall be concerned with mappings of a given linear 
space into its field of scalars; such mappings are called functionals. 
After presenting the basic ideas, we shall develop to some extent the 
theory of linear functionals when the linear space is normed. As in 
previous chapters, we deal with real and complex spaces impartially, 
except at a few places where there is in fact some essential difference 
between the two cases. 


§1. BASIC DEFINITIONS AND CONCEPTS 

Definition I : Let L be any linear space, and let the scalar 1(f) be 
defined for each vector f in L. Then the function I (whose domain is L and 
whose range is part or all of the field of scalars) is termed a functional (on L). 
If for every pair of vectors {/, g} and for every scalar a the equalities 
l(f + g) =: 1(f) -f 1(g) and /(a/) = a/(/) hold, then I is termed a linear 
functional. 

Examples 

(a) Let L be the class of all (real- or complex-valued) functions 
defined on the interval [0, 1], and let 1(f) = fW). Then / is a functional 
on L, but it is not linear. 



I. BASIC DEFINITIONS AND CONCEFTS 103 


(b) Let L be as in (a), with /(/) = 2/(>) - 3/(f). Then / is a linear 
functional on L, 


Theorem I : {a) If I is any linear functional, l(o) = 0 

(b) If I is any linear functional, iff,f,,f, f are any vectors, 

and if aj, , a„ are any scalars, then /(a, f, + x.,L + ■■• + a r\- 

Proof: Trivial. 


Theorem 2: Let I he a linear functional defined on a pnite- 

dimensional linear space, let the vectors f„f^, /„ constitute a basis. 

and let the scalars /(/,), /(/j), . . . , /(/„) be known. Then I is completely 
determined. Conver.sely, if scalars a,, a,, . . . , a„ arc chosen at pleasure, 
there exists one and only one linear functional I .such that l{fi) = 

1 ^ k ^ n. 


Proof: Every vector/ possesses a unique representation of the form 
/ = A/i + Pifi + • • • + P„fn< and so 1(f) musl equal llj(fi) + pj( ft) + 
■ ■ ■ + PJifn)- Conversely, for every vector /let /(/) = 2^” ,, a^dk- Then 
it is trivially evident that / is a linear functional satisfying l{fi) = a*, 
1 ^ k n. 


Definition 2: Let I he any functional {not necessarily linear) on L 
and let a be any scalar. Then a/ is the functional defined in the obvious 
manner: (a/)(/) = a • /(/). If li and l^ are any two functionals on L, the 
functional /j + I 2 defined in the obvious manner: (/j -f lt)(f) = 

ll(f) + /2(/). 

Theorem 3: The class of all linear functionals defined on a given 
linear space L constitute a linear space {with the definitions of addition and 
multiplication by scalars given in Definition 2). 


Proof: Trivial. 


Definition 3: Let N be a normed linear space and let I be a linear 
functional on N. Then I is said to he hounded if there exists a real number C 
(necessarily > 0) such that \l(f)\ < C ||y i| for every vector fs N. If I is a 
bounded linear functional, its norm is defined as the greatest lower hound 
of all numbers C for which the preceding inequality holds; the norm of a 
bounded linear functional, I, will be denoted ||/|1. 


Remark : The use of the symbol || II for the norm of a bounded 
linear functional (and the very use of the word “"o™ ) ^ 

strongly, in conjunction with Theorem 3, that t e oun e 



m UNEAR FUNCTIONALS 


functionals on N constitute a normed linear space closely related to N; 
it will be seen later, in the corollary to Theorem 4, that this is indeed the 
case. 

Examples 

(a) Let N be the class of all (real-valued or complex-valued) con- 
tinuous functions defined on the compact interval [o, b], and let ||/|| = 

l/(^)l- If /(/) == f(c), where c is any fixed point on the interval 
[a, b], then / is a bounded linear functional, and ||/|| = 1. 

(b) Let N be the same class of functions as in (a), let /(/) be defined 

as in (a), but let ||/|| = \f(x)\ dx. Then / is, of course, still a linear 

functional, but it is not bounded, for the ratio |/(c)l/||/|| can evidently 
be made arbitrarily large. 

(c) Let N be any inner-product space, let g be any fixed vector in N, 
and let /(/) == (/, g) for all /in N. Then / is a bounded linear functional, 
and 1|/1| = ||^||. (Cf. Theorem 3-1.) 

Theorem 4: If and 1 2 are bounded linear functionals defined on 
the normed linear space N and if ol is any scalar, then ||a/i || = |a| • ||/i || and 
li/i 4- /all < ll/ill -f II/ 2 II. 

Proof: Trivial. 

Corollary: Let N be a normed linear space and let N*^the dual space 
of N, be the space consisting of all bounded linear functionals on N, normed 
in accordance with Definition 3. Then N* is complete, even if N is not 
complete. 

Proof: Left to reader as Exercise 1. 

Theorem 5: A linear functional I defined on a normed linear space 
is continuous iff I is bounded, and also iff I is continuous at o. 

Proof: Left to reader as Exercise 3. 

Exercises 

1. Prove the corollary of Theorem 4. 

2. Prove that a linear functional defined on a finite-dimensional 

normed linear space is automatically bounded. 

3. Prove Theorem 5. 

§2. THE PRINCIPLE OF UNIFORM BOUNDEDNESS 

This section is devoted to a single result, the Banach-Steinhaus 
theorem, or principle of uniform boundedness, together with an application 
of this result to an interesting problem of classical analysis. 



2. THE PRINCIPLE OF UNIFORM BOUNDEDNESS lOS 


Theorem I : Let B be a Banach space and let {1^} be a (non-empty) 
collection of bounded linear functionals on B. Suppose that the are 
pointwise bounded — i,e.^for each fEB there exists a number K such that 
^ ^ f^^ indices a. (Of course, K will vary, in general, with f; 
the essential point is that A does not depend on a.) Then the l^'s are bounded 
in norm~~Le., there exists a constant C such that \\IJ < Cfor all indices a. 

Proof: For each positive integer n let ^ be the subset of B on 
which |/a(/)l < «. Since f is continuous (by Theorem 1-5), ^ is 

closed. Hence the set A,, = £^^ ^ is also closed. (Note that A„ is the 

set of all vectors /for which all the numbers l/a( /)| are < n.) By hypothesis, 
every /belongs to at least one /4,™e.g., if |///)| < 2.7 for all indices a, 
then/Gy^g. Thus, = since B is complete, the Baire 

category theorem guarantees that at least one of the closed sets say 
A y, has non-empty interior; i.e., there exists a vector and a positive 
number r such that, for every index a, l/,(.v)l < N whenever ||x — x^W < r. 
Letting j = x — Xq, we obtain \l^(y 4- .Vo)| < N whenever ||j|| < r. 
Since f(y) = Ify + Xo) //Xy), we obtain < \L(y + .Vo)| -f 

|/a(*^o)l Ki N + N = 2N whenever |!j|| < r. For any / except o we may 
write/ = (ll/ll/OWII/ll)/. and so 



since il(r/||/||)/|| = r. Thus, for any vector / (the case f o h trivial) 
and for any index a, we obtain \f(f)\ < (INjr) ||/||, and so ||/^|| < INjr 
for all a. 

We now proceed to apply the preceding theorem to the theory of 
Fourier series.* We shall state here only those elementary ideas which 
will be employed in the following development. Let £[— tt, tt] denote the 
class of continuous functions (either real-valued or complex-valued) 
defined on the closed interval [— tt, tt] satisfy ing/( — 7T) = /(tt) (so that / 
can be extended continuously to all of A as a function with period 2tt). 
Clearly £[— tt, tt] becomes a Banach space if ||/|| is defined as max |/(x)| 
for every / 6 £[— 77 , rr]. With each member / of the collection P[ — 7r, tt] 
we associate the Fourier series 


00 




* In contrast to the discussion in §4-5, wc find it slightly more convenient here to 
work with the interval [ — n, tt] rather than [0, 1] and to employ the trigonometric 
functions rather than the exponential functions. 



m LINEAR FUNCTIONALS 


where 

I 1 rjr 

as - I f(t) cos kt dt and h*. = - f(t) sin kt dt, 

TT J—jT TT J— jr 

The most obvious questions concerning this series are: (1) Does it 
converge and (2) if it does converge, is the sum of the series equal to 
/(x)? (Of course, it is conceivable that the series converges for some 
values of x and not for some other values of x.) The answer to the second 
question is affirmative — if the series converges for a particular value of x 
it converges correctly — that is, it converges to /(x). The answer to the 
first question is much more complicated. For example, it is easy to prove 
that if /is differentiable at a particular value of x, then the series converges 
for that value of x. With more effort, it can be shown that the series 
converges at x if the function /is monotone in some neighborhood of x, 
however small. As progress was made in proving convergence with 
weaker and weaker hypotheses, it was wondered whether, in fact, the 
series must converge without any hypotheses on / (except the initial 
hypothesis of continuity). The answer to this is no — we shall proceed to 
prove that there exists a member of tt, tt] whose Fourier series 
diverges at x = 0. (The choice x = 0 is purely a matter of convenience — 
any other value of x may be handled in a similar manner.) 

When X = 0 the Fourier series simplifies to Iato + -F * * * • 

The zeroth partial sum is Ioq, the first partial sum is {a^ + etc. 
Denoting these partial sums by /o, /i, 4, . . . we obviously obtain (by 
referring to the preceding definition of the 

Kif) = ~ I /(0{i + cos t -f cos 2t -F ■ ' • -f cos kt} dt, 

TT J-JT 

Obviously 4 is a linear functional; also, it is bounded, since 
|/t(/)l < - r(niax 1/(01) -(i + 1 + 1 + • • ■ + 1) = (2/c + 1) ||/|1. 

TT J~v 

Thus, II4II is certainly not more than {2k + 1). However, we shall need 
a more accurate estimate on ||4I|. To obtain this estimate, we employ the 
identity (cfi Exercise 1) 

J 4* cos t + cos 2t 4- * • • 4" cos kt = , k = 1,2,3, , 

2 sin Jr 

Thus, 

i(*(/)i < Jj/(‘)i ■ 


sin (k + i)t 
lit sin \i 


dt < 11/ II 


£ 


I sin (k 4- i)t 


In sin Jt 


dt. 



2. THE PRINCIPLE OF UNIFORM BOUNDEDNESS W 



and so 



sin (k 4- i)t 
Itt sin 


(it. 


We shall now demonstrate that, in fact, equality must hold; we do this 
by demonstrating a function / (belonging to P[—7t, tt]) of unit norm such 
that 

m)\ > + J’ 

where t is any prescribed positive number. Consider, for ease of exposition, 
the particular case k = 2, and let the function /be chosen as indicated in 
Figure 3. Clearly, /is a member of tt, tt], and if the inclined segments 

are very steep, the integral 


sin {k 4- l)t 
2tt sin 


dU 



sin (k 4- i)t 
27 t sin it 


108 LINEAR FUNCTIONALS 


is very close to 


I 


I sin {k 4- i)t 
2 tt sin \t 


dt. 


and so, for this particular /, 4(/) is very close to 


i 


’ I sin (k + l)i 
2 tt sin It 


dt. 


Thus, the norm of the functional 4 

\dt‘ 


r 

sin (k + l)t 

L 

2tt sin It 


on the other hand, we have seen that il4ll is not more than this amount. 
Hence, ||4II equals this quantity, as asserted previously. 

Having obtained an exact expression for ||4I! in the form of a definite 
integral, we can now easily show that ||4II is an unbounded function of /c; 
in fact, II4II -foo as k increases without bound. To demonstrate this 
fact, we rewrite our previous result in the form 


\m 


I r Isin (k 4- m _J_ 
7T Jo t sin 


For t in [0, tt ], //(sin | 7 ) is continuous and positive; since this interval is 
compact, the minimum of //(sin \t) is a strictly positive number, say a. 
(Actually , a = 2, but this is of no importance.) Therefore, 


r{k+\mn 


77 Jo / 77 Jo 

Hence (for k > 0), 

a Isin »| 


sin u\ 


du. 


II 411 > 


77 Jo U TT m—oJm 


du 


>-I 


1 


77 m«0(m -h 1)77 J mw 


I 


j 2a 1 

sin u| dw = — 2 


77 m~0 /?! “b 1 


More explicitly, 


2a 


1 -f I + • • • + 


i)' 


and so 11411“^+'^ ^ increases without bound, since the harmonic 

series 1 4- i + J + • * * is known to diverge. 



3. BOUNDED LINEAR FUNCTIONALS IN HILBERT SPACES IW 


Now suppose that for every f in P[-7t, n] the Fourier series of/ 
converges at x = 0. This means that the sequence of scalars /o(/), 
A(/)> Uif)^ • • • converges. Since a convergent sequence of scalars is cer- 
tainly bounded, the family of bounded linear functionals {/^} (where A' = 0, 
1,2,...) would bt pointwise bounded, and then, according to Theorem I , 
the set of numbers would be bounded, contradicting the result 

||/jtll +00 which we have just demonstrated. Hence, there must exist 
a function /in F[— tt, tt] whose Fourier series diverges at .v = 0. 

Remarks: (a) Note that the preceding argument does not tell us how 
to find a function / whose Fourier series diverges at a = 0; it only 
assures us that such a function exists. A specific example is presented in 
[Titchmarch, pp. 416-418]. 

(h) We have done wore than show that there exists a function 
whose Fourier series diverges at .v = 0; we have shown that there exists 
a function whose Fourier series diverges imhoundedly at a = 0. 

(c) By a slightly more refined argument we can even show that there 
exists a function in P[—tt, tt] whose Fourier series diverges on some 
uncountable dense subset of [ — tt, tt]. (See [Rudin, pp. 101-103].) 

{d) Part (c) suggests that there exists a function in F[-~7r, tt] whose 
Fourier series diverges everywhere in [~tt, tt]. The answer to this was not 
found until 1966, when Carleson showed that the Fourier series of any 
function in L^([— tt, tt]), and hence of any function in P[~tt, tt], con- 
verges almost everywhere. On the other hand, it was shown much 
earlier, by KolmogorofT, that there exists a function in L^([ — 7t, tt]) whose 
Fourier series diverges everywhere. (Since the functions cos a*, sin a, 
cos 2a', sin 2a', ... are bounded, the coefficients Oi , /? 2 , ... arc 

defined for any function /in tt]), and so one can associate a 

Fourier series with /.) 

Exercises 

1. Prove the trigonometric identity previously stated, namely: 


2 . 


I + cos t + cos 2/ + • • • + cos kt = 
t 


Prove that min.. — — 

^ ’^sm It 


= 2 . 


sin (A + l)t 
2 sin It 


§3. BOUNDED LINEAR FUNCTIONALS 
IN HILBERT SPACES 


The task of identifying in some reasonably explicit manner the set 
of all bounded linear functionals on a given normed linear space is in 



no LINEAR FUNCTIONALS 


general a very difficult one; in the case of the spaces L^(A) a very satis- 
fying and elegant solution to this problem does exist, as we shall indicate 
in the following section. The present section is devoted to obtaining the 
surprisingly simple solution to this problem in the case of a Hilbert space 
and to one application. 

Theorem I (Riesz Representation Theorem):* Let I be any 

bounded linear functional defined on a Hilbert space H. Then there exists 
a unique vector in //, which we shall denote gi, such that 1(f) = (/, g^) for 
every vector f Furthermore, ||/|| = ||gj|. 

Proof: First, uniqueness is easily established, for g ^ h, then 
(g — h,g — h) > 0, and so (g ^ h,g) (g — h, h); this shows that g 
and h cannot determine the same bounded linear functional. Secondly, 
if / is the zero functional, the zero vector o will suffice. Setting aside this 
trivial case, let S denote the null-space of / — i.e., the set of all vectors / 
such that 1(f) = 0. Since / is linear, S is a linear manifold, and since / is 
continuous, S must be closed and, hence, a subspace of H. Since / is not 
the zero functional, there exist vectors / such that 1(f) ^ 0. We choose 
any such vector, say f, and express it, in accordance with the projection 
theorem, in the form 


/i = ^1 + K (gi eS,hieS^,hi9i o). 


Now let / be any vector in H. Appealing once again to the projection 
theorem, we write 

f^g + h (geS,heS^) 


Now let us consider the vector / = /— {/CA)//(/i)}/i. By linearity, 
Kf) = Kf) — {^(/)/^(/i)K(/i) = B, and therefore feS, On the other 
hand. 


/=(g + /i)~ 


(Ml 





This equation clearly provides the unique decomposition of /into the sum 
of a vector contained in S and a vector orthogonal to S; but since f e. S, 
we obtain h — {l(f)l l(f i)}hi = o, or 




* This name is also given to a much deeper theorem which provides the solution 
to the problem of identifying all the bounded linear functionals on the Banach space 
C(t<i, 61) provided with the maximum norm, |1/11 = max^^x^b ll/(x)ll. 



3. BOUNDED LINEAR FUNCTIONALS IN HILBERT SPACES III 


Taking the inner product of both sides with h^, we obtain 


/(/)(^i. hy) = /(/,)(/,, A,). 

Since (g. A,) = 0, we may replace the right side of this equality by 
^i) + (f . A,)}, A,), or (/, /{/,)A,), and so 


/(/) = 


1 


(A„A,) 


(/. /(/)/>.) 




«/.) 

(/»!. hi) 



Letting g, = (/(/i)/(A,, Ai))Ai, we obtain /(/) = (/, g,). The proof that 
ll/|| = llgill is elementary, for by the Schwarz inequality, |/(/)l = 
l(/. ^()l < k!ll • ll/il. and so ||/|| < ||g,l|; on the other hand, |/(g,)l = 
and so !|/|| > llg,||. Combining these two in- 
equalities, we obtain ||/|| = ||g,||. 


We shall now discuss at some length one particular application of 
this simple but important theorem. Let D be a bounded domain (= open 
non-empty connected subset) of the plane. (With trivial modifications 
the entire discussion can be extended to higher-dimensional euclidean 
spaces, and the restriction to bounded domains is made purely for ease 
in exposition.) A real-valued continuous function u defined on D is said 
to be harmonic in D if for every closed disc A contained in /) the mean 
value of u over A (i.e., (area of A) ^ u dx dy) equals the value of u at 
the center of A. Obviously, the class of all such functions is a real linear 
space (with the obvious definitions of addition and multiplication by 
scalars), and this statement remains true if we impose the additional 
restriction that ^^y\U^dxdy< (The continuity of u and the 

boundedness of D fail to guarantee the finiteness of dx dy, since 
may become large near the boundary of D. For example — we do not go 
into the details— the function u = (1 ~ x)/(l — 2x -f x^ y^) is har- 
monic in the open unit disc, {(x, j) | -f < 1}, but the integral of 
over this domain is infinite; note that u is unbounded in the neighborhood 
of the boundary-point (1,0).) Furthermore, (w, v) = JJa 
constitutes a valid definition of an inner product; the inner-product 
space thus obtained will be denoted H{D), the letter H suggesting the 
word “harmonic.” 

(The reader is probably aware that the usual definition of a harmonic 
function is that it is one which possesses continuous second partial 
derivatives in a specified domain D and satisfies Laplace's equation, 
d^uldx^ -f d^ujdy^ = 0, at all points of D. The equivalence of this defini- 
tion with the one which we have presented is a result of fundamental 
importance in the theory of harmonic functions, but we shall have no 
need to refer to Laplace’s equation, or even to use the fact that the 
functions with which we are dealing are differentiable.) 



112 LINEAR FUNCTIONALS 


In fact, the inner-product space H{D) turns out to be complete, 
infinite-dimensional, and separable. We now proceed to prove this 
assertion, although some of the details will be left as exercises. For 
convenience, we present the development in a sequence of lemmas. 

Lemma I : Let u e H(D), let P he any point of Z), and let R denote 
the distance from P to the boundary of D. (Since D is open, R > 0.) 
Then \u{P)\ < (77i?2) V2 ||^||^ 

Proof: Let A be a closed disc centered at P and possessing radius 
R' , 0 < R' < R. Then 0 < (u — u{P)f dx dy = _ 

2w{P) JJ'^ u dx dy -f J’Ja dx dy = f dx dy — 2u(P){7rR'^u(P)} -f 
itR'^u\P) = j’J\* iP dx dy - ttR'^uHP). Hence, TTR'hP(P) < dx dy < 
j^iP dx dy ^ \\u\\^, and so |w(P)| < ||w||, or |zy(P)| < 

|lw||. Since R’ may be chosen arbitrarily close to R, we obtain 
the desired inequality. (Roughly speaking, this result shows that a 
harmonic function possessing small norm must be pointwise small.) 

Lemma 2 : If a sequence of harmonic functions u^, . . . (not 

necessarily belonging to H(D)) converges uniformly in Z), the limit function 
is also harmonic in D. I'he same result holds true if the hypothesis of 
uniform convergence in D is weakened to uniform convergence on every 
compact subset of D. 

Proof: Left to reader as Exercise 1. 

Lemma 3 : Let z/j, zzg, ... he a Cauchy sequence in H(D). Then 
this sequence converges pointwise in D, the limit function u is also a member 
of H(D), and ||zz w,J| - > ^ as n oo. Thus, H(D) is complete, and hence 
it is a Hilbert space. 

Proof: Let A be any non-empty compact subset of D, let r(P) 
denote the distance from any point P of /) to the boundary of D, and let 
R = minpg 4 r(P). (Since A is compact, R > 0.) Given € > 0, choose N 
so large that whenever m and n both exceed N the inequality iirv^ — u^J\ < 
holds. Then referring to Lemma 1 (with u replaced by z/„ — z/„,) 
we obtain, for any point P in D, the inequality 

l«„(P) - »m(P)l < ||H„ - u„l| <f-e; 

r(P) 

if P is confined to , we therefore obtain the inequality |zzn(^) — w^(^)l< 
€. Thus, the sequence Ui, u^, . . . converges pointwise everywhere in D 
and uniformly on compact subsets. By Lemma 2, the limit function, 
which we denote u, is harmonic in D. 



3. BOUNDED LINEAR FUNCTIONALS IN HILBERT SPACES 113 

Now let A again denote any compact subset of £>, and let tl be 
assigned a meaning analogous to that of || ||, except that integration over 
A, rather than over Z), is performed. Then and are all 

well-defined. (Since A is compact and u is continuous, u is bounded, and 
hence quadratically integrable, on A.) We then obtain < 

l|w — -f < ilw — -f ||w„|| < llw — -f sup l|w„||. 

(Recall that the norms of the vectors constituting a Cauchy sequence are 
convergent, hence bounded.) Letting n increase without bound and 
recalling that the sequence if|, Wg, . . . converges uniformly to w on wc 
conclude that ||w - - > 0, and hence < sup l|uj|. Since the 

right side of this inequality is independent of the particular choice of Ay 
we can allow A to increase within D; in the limit we obtain the results 
that ||!/|| is finite and that ||i/|| < sup |lw„||. 

It remains to prove that |lz/ ~ w„|| 0 as n oo. Let e > 0, let N 

be so chosen that ||?/„ — < e whenever m and n exceed N, and let the 

compact subset A of D be so chosen that < € and Hwy^ < 

€. Then, whenever /? > TV, 


iiM - - {||M - 

= {li« - ",,11'"'’}'“ + {ll« - M.VU + (M.v.1 - 

< {||M - + 11%.,, II + 11%, 1 - 

<{||«-%ll'-'’}“ + (3«)*. 


As before, \\u — approaches zero with increasing w, and so we 

obtain the inequality 0 < lim ||w — w,J| < 3e. Since e may be chosen 
arbitrarily small, we obtain the desired result, namely ||m — i/,J| 0. 


Lemma 4; For each fixed point Q of D, there exists a unique 
member of H{D)y which we denote Uq, such that for every u G H(D) the 
following equality holds: 


w{G) = Wg) = uuq dx dy 


Proof: Let Iq{u) be defined, for every u e H(D), by the equation 
/^(u) = u(Q); clearly, / is a linear functional, while Lemma 1 shows that 
Iq is bounded. Theorem 1 now guarantees the existence and uniqueness 
of the function Uq. 

Lemma 5: H(D) is separable. 


Proof: It follows immediately from Lemma 4 that a function u 
belonging to H{D) vanishes at 0 iff w is orthogonal to Uq. Choose a 



114 UNiAR FUNCTIONALS 


countable dense subset {Qi, 02 . Ca. - . } of £> and form the corresponding 
sequence of functions Uq^, .... A function u belonging to H{D) 
which is orthogonal to all these functions must vanish at all the g^’s, and 
then by continuity it must vanish identically. Hence, H{D) is spanned by 
a countably infinite collection of its vectors (cf. §4-5), and is therefore 
separable. 

Lemma 6: H{D) is infinite-dimensionaL 
Proof: Left to reader as Exercise 2. 


Now, if in Lemma 4 we choose u — Up, where P is any point of Z), 
we obtain Up{Q) = (up, Uq) = (uq, Up) = Uq(P). This suggests intro- 
ducing the notation K(P, Q) instead of Uq(P), for then we see that the 
result which we have just obtained can be rewritten in the form K(P, Q) = 
K(Q, P). The equation appearing in Lemma 4 can now be written in the 
form 



u(P)K(P, Q) dx dy. 


D 


where the variables of integration x and y are the coordinates of P. From 
this equation it is apparent why K(P, Q) is termed the reproducing kernel 
of the space H{D). 

Now let u be any function in H{D) satisfying the condition u{Q) = 1 , 
where Q is any fixed point of D. Then from the preceding results and the 
Schwarz inequality, we obtain 


1 = M(0 = (W, Uq) = |(M, Uq)\ < ||m|| • ||Wg||. 

Therefore, l|wll > 1 /||wq|1, and equality holds iff w is a multiple of Uq\ 
i.e., iff u{P) = cX(P, Q). Setting P = Q, we obtain I = cK{Q, Q), or 
c =» IjKiQ, Q). Thus, of all members of H{D) satisfying the condition 
ii(0 =1, the function K(P, Q)lK(Qy Q) possesses the smallest norm, 
which is 

1 ^ 1 ^ 1 ^ 1 

(Uq, uqY^^ {uQ{Q)y^^ im. ’ 

Finally, we proceed to obtain an expansion of the reproducing kernel 
K{Py Q) in terms of an arbitrary orthonormal basis {t’l, 1 ^ 2 , . . . , 

(We know that an orthonormal basis of H{D) exists, consists of a count- 
ably infinite subset of H(D), and that infinitely many choices of such a 
basis can be made.) For any function u in H{D) we know that the 
expansion 

00 

u = 

Jb«X 



J. BOUNDED LINEAR FUNCTIONALS IN HILBERT SPACES IIS 


holds, in the sense that ||« - jLi («. Owith increasing n. If we 

denote the finite sum («, as u„, then ||u - u„|l — 0, and by 
earlier results we conclude that the sequence Ui(P), m,(P), . . . 
converges pointwise to u(P) for all P e D and that the convergence is 
uniform on each compact subset of D. Thus we may write 


00 


u(P)^liu,v,)v,(P) 

jfc«l 


with the assurance that the infinite series appearing on the right actually 
converges. If, in particular, we choose for u the function Wg, we obtain 

Uq(P) = KiP, 0) = 2iuQ, t>it)fic(P) = f.Vk(Q)Vi(P)- 

fc-l Jt-1 


This expansion (which explicitly demonstrates that the reproducing 
kernel is a symmetric function of its two arguments) is especially remark- 
able in that it holds for every choice of the orthonormal basis i?i, 

(The reader may find it instructive at this point to review the problem 
discussed in §4-4.) 


Exercises 

1. Prove Lemma 2. 

2. Prove that the functions Re (x -f /y)”, n = 0, 1, 2, 3, . . . , are 

harmonic and linearly independent in any domain ; since they 
are bounded in any bounded domain D, it follows that H{D) 
is infinite-dimensional. 

3. Let Qi, 02* • • 9 Qn^ distinct points in the bounded domain D, 

Show that the functions Uq^, «g^, . . . , Uq^ arc linearly 
independent. 

4. Let Qi, . . . yQnhc distinct points in the bounded domain D 

and let the (real) constants a^, a 2 , . . . , a„ be given. Show 
that the problem of minimizing (lw|| (in H{D)) subject to the 
interpolation conditions w(Gfc) = 1 < it < n, possesses a 

unique solution. 

5. Let D and B be bounded plane domains, with reproducing 

kernels K and R, respectively. Suppose that Qe D ^ B, 
Prove that K{Qy Q) > ^(G, Q), 

6. A sequence gt, ga, . . . of vectors in a Hilbert space H is said 

to converge weakly if for every vector f inH the sequence of 
scalars (/, g^y (f, (/, gz ), ... is convergent. 



m UNEAR FUNCTIONALS 


(a) Prove that a convergent sequence of vectors is weakly convergent. 
(h) Prove that a necessary condition for a sequence of vectors to 
converge weakly is that their norms be bounded. 

(c) Demonstrate a sequence of vectors which converges weakly but 
does not converge. 


§4. THE DUAL SPACE OF L^(A) 

Theorem 3-1 may be stated, somewhat imprecisely, in the following 
form: If H is any Hilbert space, then H = //*. We now proceed to 
show that, in a sense to be made quite precise, = U{A) if 

1 < /^ < 00 . (The reader may find it helpful to refer back to the definition 
of L^{A) and to the definition of N*, where N is any normed linear space.) 

Let p be fixed, 1 </?< +oo, let the corresponding conjugate 
number q be determined, and let g be any member of L'K (As in Chapter 
2, we frequently use the notations D instead of L'\A), L^(A).) 
According to the Holder inequality, \^fg\ < * WfWp for every /in L^. 

Since the equality j(a/i + § fig A- j f 2 g holds for every pair of 

functions {fi.fi} in and every pair of scalars {a, ^}, we see that the 
equation /y(f) = jjfg defines a bounded linear functional on further- 
more, by reference to Exercise 2-2 of Chapter 3 it is easily seen that 

~ Thus, each member g of U determines a member of (L'')*, 
and the correspondence thus defined is norm-preserving. In particular, 
is the zero functional iff ||;^||q = 0, which in turn is equivalent to the 
condition g(.v) = 0 a.e. Thus, distinct members of determine distinct 
members of (L*’)*. Furthermore, the correspondence is a linear one; 
this simply means that if a and /9 are any scalars and if gi and g^ are any 
members of L\ then 

The preceding remarks obviously suggest the following question: 
Given a member / of (L^)*, does there exist a member g of U such that 
I ^ Igl The remarkable answer to this question is in the affirmative for 
/ < /? < -f 00 but in the negative for/? = -f-oo. A complete justification 
of this assertion would require the development of a portion of the theory 
of Lebesgue integration that we have not presented in Chapter 2 — 
namely, the determination of the precise conditions under which a 
function /, defined on some interval, shall be the indefinite integral of 
some (summable) function F and the answer to the converse question: 
If F is summable in an interval [a, b] and if f(x) = F for x e [a, /;], in 
what sense does the familiar equality f\x) = F(x), easily established 
when Fis continuous, carry over to the present case? 

First we shall present, through Definition 1 and Theorem 1, the 
answers to the two previously stated problems, but we shall not give the 
proof. Then we shall indicate briefly how to settle the question posed at 
the beginning of the preceding paragraph. 



4, THE DUAL SPACE OF L^A) 117 


Definition I : Let f be a {reaLvalued or complex^imlued) function 
defined on an interval {open or closed, bounded or unbounded). The function 
f is said to be absolutely continuous if for every e > 0 there exists a 6 >Q 

suchthat for every finite disjoint collection of intervals {{a^b^). {a^,b ^, . . . , 

b j} with total length less than d the inequality l/(^fc) < « 

holds, {Of course, if f is absolutely continuous on some interval it is also 
absolutely continuous on any subinterval,) 

Theorem I : If f is absolutely continuous on a bounded closed 
interval [a, /?], then the derivative f ' exists almost everywhere. Furthermore, 
/' issummable on the interval, and f' = / (.v) — f {a) for every x e [a, b]. 
Conversely, if F ^ /)]), the indefinite integral of F, defined for all 

X e [a, b] by the equation 


is absolutely continuous, and f'(x) = F(x) almost everywhere in [a, b]. 


Now we can return to the main problem of this section. For ease in 
exposition, we choose for A the interval [0, 1] and confine attention to 
the real, rather than the complex, L^{A ) — that is to say, the members of 
L^{A) are (equivalence classes of) real-valued functions and /(/) is, for 
each /in L^\A), a real number. (The extension of our results to any other 
measurable set A and to the complex L^\A) is a task of very minor 
difficulty.) For each number a in A, let ha be the characteristic function 
of the interval [0, a], and let h{a) = /(/?«). (Note carefully that there is a 
function /?„ for each a, while H is just one function.) More generally, 
let h^^^ be the characteristic function of the interval la,b], where it is 
always understood that 0 < ^ < /? < 1. Note that the equality h^^ Jx) = 
hi,{x) — ha{x) holds almost everywhere (in fact, everywhere except at a), 
so that l{ha,j = l{h,) - l{ha) = H{b) - H{a). 

The continuity of the function FI is immediate: For b > a, 

\H(h) - H(a)\ = |/(/i„,„)| < 1|/|| ■ {£ l/" = ll/ll • (b - ay/\ 

which approaches zero as b — a does so. In order to show that H is, in 
fact, absolutely continuous, select any finite collection of disjoint intervals 
(it really makes no difference whether they are open, closed, or half-open), 
{(cti, bf), (^ 2 , • • • . (^n» bj). Define a function /as follows: /= -f 1 

everywhere in the interval {aj,, bf) if H{bf) — H{af) >0,/== —1 every- 
where in the interval {a^, bf) if H{bf) — H{af) < 0, and/= 0 outside the 



118 LINEAR FUNCTIONALS 


union of the given collection of intervals. Obviously /g and 


furthermore, 




/(/) = /( i ± = i ± {Kh,) - /(/!„,)) = i |W{/)») - W(fl,.)|. 

\jr-l / It-l 

But |/(/)| < |1/|| • ll/ll^, and so 

i|H(6*) - H{a,)\ < PH • j i(6, - a,))*"'’. 

Ar-l (fc-1 ) 

Given any c > 0, let (^ = (€/||/||)^ Then whenever (hf, — a^) is less 
than d, the quantity ||/|| ♦ “* than ||/1| ■ which 

is exactly e. Thus, \M{hf,) — //(^7*.)| < e, and so H is absolutely 
continuous. (Note carefully that the disjointness of the chosen set 
of intervals and the finiteness of p are both essential in the preceding 
argument.) 

Now, by Theorem I , //'exists almost everywhere, and H(b) — H{a) = 
fj; //'. If/is the characteristic function of the interval (a, /?), the equalities 
/(/) = H{b) ~ H(a) = j;; //' = fH' = fH' must hold. (The last 
equality follows from the fact that / vanishes outside (a, b).) If E is the 
union of a finite collection of open intervals, then by an obvious extension 
of the preceding argument we obtain the equality 

(4-1) 

(Recall that XnjM equals 1 or 0, according as y does or does not belong 
to £.) By an easy passage to the limit we find that (4-1) continues to hold 
if E is the union of a countably infinite collection of open intervals; 
hence, (4-1) holds for all open subsets of /t (= [0, 1]). Using the fact 
that every measurable set is almost open (in the sense made precise by 
the definition of measurability), we can establish (4-1) whenever £ is 
measurable. Then by taking (finite) linear combinations of characteristic 
functions we obtain, for any simple function* s, the obvious generalization 
of (4-1): 

/(s) =£sW'' (4-2) 


♦ Here, in contrast to Chapter 2, we may obviously f>ermit .v to assume negative 
values. 



4. THE DUAL SPACE OF L'^^A) 119 


By an easy limiting operation we then obtain, for any bounded measurable 
function /(which clearly belongs to L^(A) for all values of p. including 
+ oo), the formula 

/(/) = CfH'. (4-3) 

Jo 

Finally, by a more delicate limiting operation we can show that (4-3) 
continues to hold for the unbounded members of V'(A). 

Replacing H' by g, we obtain the desired representation of the 
bounded linear functional /, namely 

Kf) =£/«. (4-4) 


However, it still remains to show that 6 Theorem 1 guarantees 

only that U{A). We present this argument in some detail, since it 
provides a splendid example of the power of the principle of uniform 
boundedness. Let the sequence of functions ... be defined as 

the truncates of g: ^,,(.v) = — or n according as g(\) < —w, 
|^(a')| < n, or ^i^(a) > /?, and for each /in U{A) let 

/„(/)=£,/«„• ( 4 - 5 ) 

Each /„ is a bounded linear functional, for, by the Holder inequality, 
\fn(f)\ < WfWv ^ ll/ll ir ^pbtting A into the disjoint sets 

A.^. and A on which/^ > 0 ‘dndfg < 0, respectively, we can easily show, 
by invoking the monotone convergence theorem, that lim^^_^ JJ fg^^ exists 
and equals jil fg. Thus, the collection of numbers {/„(/)} is bounded at 
each point of L^{A), and since L^(A) is complete we are now assured that 
the set of norms {||/„||} is bounded. Since ||/„|| is known to equal we 

have shown that the set of norms (||^„||<,} is bounded. By another applica- 
tion of the monotone convergence theorem (where this time we split A 
into the subsets on which > 0 and ^ < 0) we conclude that ||^1|^, = 
^‘rn«-ao ll^nilo < Thus, we have sketched the proof of the following 

remarkable theorem. 

Theorem 2: For 1 < /? < -f oo, (L*')* = where the equality is 
to be understood in the manner explained in the third paragraph of this 
section. 

For a fully detailed proof of Theorems 1 and 2, the reader is referred 
to [Royden, pp. 94-107 and 119-123]. 

We have thus shown (except for the gaps in the argument that have 
been indicated) that there exists a one-to-one correspondence between 



laO LINEAR FUNCTIONALS 


{L^y and U which preserves both the linear structure and the metric 
structure of these two spaces, so that, when looked upon simply as normed 
linear spaces, without regard for the nature of the vectors which constitute 
the two spaces, they are indistinguishable. 

The case /? = 2 has already been covered by Theorem 3-1, for 
becomes a Hilbert space with the definition (/, g) = J/g for the inner 
product. A slight confusion may be caused by the presence of g, rather 
than g. This is readily cleared up by observing that if, according to 
Theorem 3-1 , the bounded linear functional / is associated with the vector 
g, then the bounded linear functional a/ is associated with the vector a^, 
not with the vector (xg. Thus, strictly speaking, Theorem 3-1 furnishes a 
conjugate isomorphism, rather than an isomorphism, between and 
(L^y. (Of course, this complication does not arise if we are dealing with 
real-valued, rather than complex-valued, functions.) 

We leave to the reader, as Exercise 2, the task of proving the following 
analogue of Theorem 2. 

Theorem 3: For 1 < /? < + X), (Fj* = F. 

(As might be expected, the proof of this theorem is considerably 
simpler than that of Theorem 2, since no measure theory is involved.) 

Exercises 

1. Demonstrate a real-valued function which is uniformly, but not 

absolutely, continuous on the interval [0, 1]. 

2. Prove Theorem 3. 


§5. THE HAHN-BANACH THEOREM 

If N is any normed linear space, the dual space N* is not empty, for 
it certainly contains the zero functional. It appears almost obvious that 
(aside from the trivial case when N consists exclusively of the zero vector) 
must contain other functionals as well. Incidentally, Theorem 1-2 
gives a very complete picture of the structure of N* when N is finite- 
dimensional; the absence of any reference in Theorem 1-2 to a norm is 
readily remedied by norming N in any manner consistent with the def- 
inition of a norm. The case of an infinite-dimensional normed linear 
space is more intricate and involves the use of a logical principle, namely 
Zorn’s lemma, which plays a rather peculiar and controversial role in the 
development of mathematics. However, we shall use it unquestioningly 
at the decisive point in the development of the present section, but we 
shall provide an exceedingly brief explanation of this principle for the 
benefit of the reader who may not be acquainted with it. 



5. THE HAHN-BANACH THEOREM 121 

Definition I : Let L be any linear space, let M be any non-empty 
subset of L, and let f be any vector in L, By [/] -f M vve mean the collection 
of all vectors of the form of + g, where a is a scalar and g G M. 

Theorem I : Let M he a linear manifold of a linear space L, 

(^) i^^ctor in L, then [/] 4- is also a linear manifold; if 

f E ^ , then [/] + , while if f ^ then is a proper subset of 

[/] 4- 

(h) If S is a subspace of a normed linear space N and if f is any vector 
in N, then [/ ] 4- S is also a subspace. 

Proof; (a) Left to reader as Exercise 5. 

{b) If fES, then, by part ((j), [/] 4- ‘S' = S, and so [/] 4* 5 is 
certainly a subspace. Therefore, we may confine attention to the case 
when f ^ S. We know from (a) that [/] 4- ‘S' is a linear manifold, so wc 
have only to prove that [/] 4- ‘S' is closed. Suppose not — then there 
would exist a limit-point of [/] -f S which does not belong to [/] 4- S. 
Calling this vector /?, we could find a sequence //,, //o, //g, . . . of vectors 
which do belong to [/ ] 4- ‘S' and which converge to h. Each vector li„ can 
be written in the form where gn^-^- There are two possi- 

bilities — the set of numbers {a,J is bounded or it is unbounded. Suppose 
it is unbounded. Then by confining attention to a suitably chosen sub- 
sequence, if necessary, we may suppose that |a„| - ^ oo. Each g„ may be 
expressed as — where also belongs to S. Then, dividing the limit 

relation a,,/ 4- gn - ^ h by a„, we obtain f — gn o (since (1/aJ/i o)\ 
but this says that /is a limit-point of S’, contradicting the hypothesis that 
S is closed. 

Therefore, the set of numbers {a,/} is bounded, and by the Bolzano- 
Weierstrass property we can select a subsequence of the a,/s which 
converges to a number a. Then a„/ -^ of, and from the relation a„/4- 
we see that must approach a limit, which we call g. Thus, 
of ^ g =z h. Also, since the gfs all belong to S and since S is closed, g 
must belong to S, and so h e [/] 4- S. Thus [/] 4- S contains all its 
limit-points — i.e., it is closed. 

Theorem 2: Let S be a proper subspace of the real normed linear 
space N, let I he a hounded linear functional on S {not on N), and let f be a 
vector belonging to N — S. Then it is possible to extend I from S to [f] -f- S 
without increasing its norm. ( That is, there exists a bounded linear functional 
on [/] 4- S whose norm is equal to ||/|| and whose restriction to S is /.) 


Proof: (a) Every vector belonging to [/] y- S is expressible in the 
form of 4- g, where oc e B and g E S; the vector g and the number a are 
uniquely determined by the given vector, for if there were two different 



122 UNEAR FUNCTIONALS 

representations we would have a,/ -f = ag/ -f gz, or (ai — cL^f — 
gz - or/ == (I/(ai - a 2 ))(g 2 - ^i)* TThis says that /certainly belongs 
to 5, contrary to assumption. Thus, = ag, and so gi = 
have the uniqueness. Obviously, the number a is zero iff the vector 
under consideration belongs to 5. 

(h) Suppose that the desired extension of / exists; call it /. Then for 
any vector h in [/] -f S we have /i = a/ -f where the scalar a and the 
vector g G S are uniquely determined by /?, and so we must have /(/?) = 
a/(/) + 1(g), However, since / is an extension of /, this last equation 
reduces to 1(h) = a/(/) + 1(g), and so we see that the problem of deter- 
mining/reduces to the problem of determining suitably one fixed number, 
1(f). Now, it is clear that no matter what value we assign to 1(f), say 
the equation 1(h) = a/^ -f 1(g) determines a linear functional on [/] -h S 
which extends /, but is it not necessarily true that / will have the same norm 
as /; for example, if we choose ^ \ ||/|| • ||/|| we clearly have |/(/)| = 

> + Pll • ll/ll > ll/|l • ll/ll, and so ||/|| > ||/||. (In fact, it is not even 
obvious, but it happens to be true, that every choice of makes I a 
bounded linear functional.) Thus, our problem is to show that there 
exists at least one choice of the number ^ which will prevent ||/|| from 
exceeding 1|/||. (Of course, it is impossible to accomplish ||/|| < ||/||.) 

(c) If /is the zero functional, there is nothing to prove — simply choose 
/? ar 0. Next, suppose that ||/|| = 1. Then we have to show that there 
exists a number such that for every real number a and for every vector 
g G 5 the inequality 


< II a/ + ^11 

must hold. We consider two separate cases — a = 0, a 5 ^ 0. 

(i) If a = 0, any ^ will suffice, since the inequality becomes |/(g)| < 
llgll, which is certainly true, since | 1 /|| = 1 . 

(//) If a 5 ^ 0, the preceding inequality is equivalent to 




/+- 


Now, for any non-zero a, g/a sweeps out 5 as g sweeps out S. Hence, 
we have succeeded in eliminating a from our problem — we have the follow- 
ing simplified problem: To show that it is possible to choose ^ in such a 
way that \p -f- /(g)l < ||/ + gtl for every g in S. Now, choose any two 
vectors, gi and ga, in 5, Since / is defined and linear on S and since ||/|| = 1 , 
we certainly have 


Kgi) - Kgz) - i(gi -gd< ligi - gzW = IKgi +/) + (-g2 ~/)ll 
< 11^1 +/II + W-gt -f\\ = 11 /+ gill + II/+ g^ll. 



5. THE HAHN.BANACH THEOREM 123 


and so 

-Kgi) - II/+ < -Kgx) + 11/ + ^,|i. 

Now,/.y g-j, and let gj vary freely over S. The right side cannot go below 
the left side, and so we conclude that the right side is bounded below, and 
that — % 2 ) — II/+ ^zll < inf^e.s {—1(g) + II/+ g||} for any e S. Now 
let g'2 vary freely over S. Repeating the preceding argument, we obtain 

sup {-/(g) - 11/ + gll} < inf {-/(g) + 11 / + g|j}. 

pe s gc-s 

For convenience we denote the left and right sides of this last inequality 
as y and d, respectively. Let p be chosen as any number in the closed 
interval [y, (5] (which may consist of a single point, but, as we have seen, 
is certainly not empty). With this choice of we have, for any vector 

-Kg) - 11/+ tfll < < -Kg) + ll/+^l|, 

or 

-Wf-^glKK + KgXWf-^gh 

or 

\K + Kg)\<\\f-^gh 

which is exactly what we wanted. 

(d) Finally, if 1|/|| 9^0 and ||/1| 5 *^ 1 , simply apply the previous 
reasoning to the bounded linear functional (1/||/1|)/, which has norm I. 

Remark: Theorem 2 has been stated only for a normed linear space 
over the real field, and the proof depends very heavily on the ordering of 
the real numbers by the relation <. The theorem itself is, in fact, true 
for a complex normed linear space also, but, since the present theorem 
is only a preliminary to Theorem 5, we merely point out here that the proof 
of the latter theorem will indicate clearly how Theorem 2 can be extended 
from the real to the complex field of scalars. 

Theorem 3: Let N be a normed linear space, let M be a linear 
manifold, but not a subspace, of N, Then is a subspace of N. 

Proof; Trivial. 

Theorem 4; Let N be a normed linear space, let J( be a linear 
manifold, but not a subspace, of N, and let I be a bounded linear functional 
defined only on Jt . Then I can be extended in a unique manner to become a 
bounded linear functional on .Ji , and the extension has exactly the same 
norm as 1. 



124 LINEAR FUNCTIONALS 


Proof: Trivial. 

Theorem 5 (Hahn*Banach): Let N be a normed linear space, 
real or complex, let S he a proper subspace of N, and let I be a bounded 
linear functional on S. Then I can be extended to N without increasing the 
norm. ♦ 

Proof: It appears to be in the nature of things that this theorem 
must first be proven for the real field, after which the case of the complex 
field is treated as a corollary. Accordingly, we divide the proof into two 
portions. 

(a) If A is a real normed linear space. Theorem 2 assures us that 
there exists a proper extension of / — i.e., a bounded linear functional /, 
defined on a subspace S which properly contains S, such that ||/|| = ||/|| 
and / is the restriction of 1 to S. In other words, the collection P of all 
norm-preserving extensions of / to linear manifolds properly containing S 
is non-empty. If and 4 are members of P such that 4 is either the same 
as 1 1 or a proper extension of f, we write li < l^. The relation < 
obviously constitutes a partial ordering of P; furthermore, if C is any chain 
in P (with respect to this partial ordering), the chain C determines un- 
ambiguously a particular member of P in the following way: For any 
g appearing in the domain of any member / of C, we define l^ig) to be 
l{g). (Note carefully that, since C is a chain, two different members of C 
having^ in their domains must agree at g, so that, as asserted previously, 
Icig) ts indeed unambiguously defined.) It is not difficult to see that l(j is 
a linear functional, and since |/c(g)| = |/(^)| < 1|/|| ■ ||^1| = 11/|| • ||j?||, it 
follows that 11/, 11 = 11/11; furthermore, > 1 for every member I of the 
chain C. According to Zorn's lemma, there exists a member of P, which 
we shall denote (the letter m suggesting “maximal"), having the 
property that there does not exist in P any proper extension of 1,^^ By 
referring to Theorem 4, we see that the domain of must be a subspace, 
and then by referring to Theorem 2 we see that the domain of l^ must 
coincide with N. Thus, the proof is complete in the case of a real normed 
linear space. 

ih) If A is a complex normed linear space, we can also consider N as 
a real normed linear space by the simple device of restricting scalar 
multipliers of vectors to the real field. We shall find it convenient to 
refer to N, when considered as a real normed linear space, as N^. For 
any vector /in the domain of /, we may write /(/) = g{f) + ih{f), where 
g and h are real- valued functionals. If a is real, we immediately obtain, 
on the one hand, /(a/) = g(a/) + //?(a/), and, on the other hand, 
/(o/) = a/(/) == ag(/) -F /a/f(/). Thus, g and h are (real) linear function- 
als. On the other hand, l{if) = g{if) -f ih{if) = //(/) = —/?(/) + lg{f). 


* If the reader is not acquainted with Zorn’s lemma, he should at this time turn to 
Appendix A and then return to the proof of the present theorem. 



S. THE HAHN.BANACH THEOREM 125 


Hence, h{f) = so we may express the complex linear functional 

/ entirely in terms of the real linear functional g as follows: /( /) = g{f} — 
dispense with the trivial proof that the functional g is actually 
linear.) Since \g{f)\ = |Re /(/)| < \l{f)\ < il/|| • 11/11, we see that ^ is a 
bounded linear functional defined on the subspace S of By the first 
half of this proof, we can extend to a bounded linear functional g„^ 
defined on all of Nj. without increasing the norm. Then it is readily 
confirmed that the functional defined on all of N by the relation 

LiD-^gJD-ignm 


is indeed a linear functional. Furthermore, (/,„(/) | < |g,„(/)l “F \gmiif)\ < 

• (ll/ll + Il//Ii) = 2 l|g,J| • ll/ll < 2 1|/|1 • ll/IU and so |1/„,11< 2 ' |1/||. 
In order to eliminate the factor 2, wc employ the following trick: For 
any /g TV, we can find a complex number a, |a| = 1 , such that {l/a)/„,(/) 
is real and non-negative. Then /,,((l/a)/) =g„,((l/a)/) - 7 ) 5 ^,, ((//a)/) = 
gmiOl^)/) and therefore |/,„(/)| = |(l/a)/„,(/)| = |/„,((l/a)/)| = 
l^.((l/a)/)| < WgJ • II(1M)/|| = \\gj\ • ll/ll < i|/|| ■ 11/11, and so 
lUmll ^ IKll- Since is an extension of / (to all of /V), the inequality 
II/, J| < ll/ll cannot hold, and so 11/,,J| = ||/|1. 

Corollary: (a) If f is any non-zero vector in a normed linear space 
N, there exists a member / of N* such that l{f) = ||/|| and ||/11 = 1. 

(h) N* separates N, in the sense that, given any two distinct vectors 
g, h of N, there exists a member I of such that 1(g) 1(h), 

Proof: (a) Let S be the subspacc of N consisting of all vectors of 
the form a/, and for each vector g ( — (xf) in S let 1(g) == a ||/||. Clearly, 
1(f) = ll/ll and 11/11 =1. By Theorem 5, we can extend / to all of N 
without increasing its norm. 

(b) Let/ = ^ and choose / in accordance with part (a). Then 
Kg) - Kh) == Kg - h) = 1(f) = ll/!l 5^ 0, so that Kg) ^ Kh). 

We conclude by noting that the completeness of N is not assumed 
anywhere in this section; however, the completeness of the field of 
scalars (R or C) plays an essential role. 

Exercises 

1 . Prove part (a) of Theorem 1 . 

2. Prove the assertion made in the first parenthetical remark appearing 

in part (b) of the proof of Theorem 2. 



CHAPTER 6 

OPERATORS 


§1. LINEAR TRANSFORMATIONS 
AND OPERATORS 

Although we can consider mappings from one linear space into 
another (we have done this when we developed the theory of linear 
functionals in the preceding chapter), we shall confine attention in this 
chapter entirely to mappings from a given linear space into itself. For 
definiteness we consider complex spaces. 

Definition I : A linear transformation is a mapping T of a linear 
space V into V satisfying the condition T{of -f ^g) = ai{T(f)) + fl(T{g)) 
for all vectors f, g and all scalars a, (Clearly, T(o) = o.) We write Tf 
instead of T(f) whenever no confusion can result. 

Definition 2: If ol is a scalar and T is a linear transformation, then 
% ' T is the mapping S defined in the obvious manner: Sf = cL(Tf) for all 
vectors f The sum + T2 of two linear transformations is the mapping S 
defined in the obvious manner: Sf = (Tf) -f (T^f) for all vectors f (The 
mapping ol- T is usually denoted olT.) 

Examples 

(a) Let y be the class of ordered pairs of scalars and let the vector a 
be the ordered pair (ai, a2). The mapping T defined by the equation 
Ta == (yjaj + y^oL^ 4 * where yi, y^y y^, are four arbitrary, 

but fixed, scalars, is clearly a linear transformation. The mapping aT is 

m 



I. LINEAR TRANSFORMATIONS AND OPERATORS 127 

obviously obtained by repla^cing y,, y,, by ay., ays, ay^ 

respectively. If, similarly, T is the linear transformation obtained by 
employing scalars yj, y.^, y.^, y,, then T -h f is the linear transformation 
obtained by employing the scalars y, -f- y,. y., + y^^ -f y^, y^ + y^, 

(b) Let P be the class ol continuous complex- valued functions 
defined on the interval [0, 1], and for each vector/ let Tf be the function 
g defined by the equation g{ \) = JJ / (/) Jr, 

Note that in example («) the linear transformation T is an onto 
mapping* iff yjy 4 — y.^y^ ^ 0, while in example (/?) the linear trans- 
formation T is certainly not onto (why?). 

Theorem I : The class of all linear transformations on L. ilenoted 
L(V), is itself a linear space (with the definitions of - and -f given in 
Definition 2). 

Proof: Trivial. 

Definition 3: The product TJ \2 of the linear transformations 7\ and 
Tj is the transformation S defined in the obvious manner: Sf = TfT^f) 
for all vectors f 

Theorem 2: The product of two linear transformations is again a 
linear transformation. In genera f 1\T,^ TfTi: if eifuality holds, then 
Tj and T^ are said to commute. 

Proof: Left as Exercise 1. 

Theorem 3; // T,, T^ are any three linear transformations, then 

TfTfT^) = {T^Tf)T^, so that we may simply write T^TfT-^. (By induction 
it follows that T^T,- ' T,, is unambiguously defined.) 

Proof; Left as Exercise 2. 

Definition 4: The linear transformations O (zero) and I (identity) 
are defined in the obvious manner: Of — o and If ~ f for every vector f. 
Obviously OT = TO O and IT — Tl ^ T for every linear transfor- 
mation T. 

Theorem 4: for any linear transformations Ti, T 2 , and scalars 
a, (I, the equality Ti(cf.Ti + (ITf) = ^^(T^T.i) -f l^(TiT^) holds.^ 


* Recall that a mapping from a set yt to a set ^ is said to be onto if each member of 
B is the image of at least one member of A — that is, if the range of the mapping coincides 
with B. 

t Theorems 1 through 4 can be condensed into the single statement that 1(E) is an 
algebra. We shall not present a precise definition of this term, however; roughly, it 
denotes a linear space in which any pair of vectors (as well as a scalar and a vector) can 
be multiplied. 



128 opehators 


Proof: Left as Exercise 3. 

Now we turn to the case that L is a normed linear space; we therefore 
denote the space as N rather than V. The ideas presented here constitute 
an obvious generalization of some of the ideas concerning bounded 
linear functionals. 

Definition 5: A linear transformation T defined on N is said to he 
hounded if there exists a real number C such that || 7^ || < C || f\\ for every 
vector f IfT is hounded, 1| T \\ , called the norm of T, is defined as the inf of 
all values of C for which the preceding inequality holds for all vectors f 
(Cf Exercises 4, 5, 6.) A hounded linear transformation is henceforth 
called an operator. 

Theorem 5: (a) HarH = |a| • ||r|| for any scalar a and operator T. 

(h) II r, -f nil < 117,11 4- 117,11 for any operators 7, and 7,. 

(0 117,7,11 < 117,11 • ||7,||/or any operators 7, and 7,. 

{dy ||0|1 =0, 1|/!| = 1. 

Proof: Left as Exercise 7. 

Definition 6: For any operator 7 and positive integer n, we define 
T'^ as 77* • • 7 We occasionally find it convenient to define 7‘* as 1. 

n times 

Definition 7; A non-zero operator 7 is said to he nilpotent ifT'^ — O 
for some positive integer n. {Cf. Exercise 8.) 

Exercises 

1 . Prove Theorem 2. 

2. Prove Theorem 3. 

3. Prove Theorem 4. 

4. Give an example of an unbounded linear transformation. 

5. Show that a linear transformation defined on a finite-dimensional 

normed linear space is certainly bounded. 

6. Show that, if 7 is an operator, then 

lini = sup Iir/Ii = sup Iir/ii - sup^ . 

ll/!Hi ll/ll -1 ll/ll 

7. Prove Theorem 5. 

8. Give an example of a nilpotent operator. 


* Strictly speaking, the second assertion is correct only if N contains a non-zero 
vector, for when N consists exclusively of the vector o (and only in this case), 1=0. 



2. THE ADJOINT OPERATOR IJ9 


§2. THE ADJOINT OPERATOR 

We shall now show that any operator T on a normed linear space N 
determines in a natural manner an associated^ or duaL operator on N*. 
Let / denote any bounded linear functional on N (i.e., / e N*). Then for 
each/e N let us define /j,(/) as follows: /^,(/) = l(Tf). Clearly, for any 
vectors /, g and scalars a, the following chain of equalities hold: 

+ A?) = KTicLf -f (ig)) = I(oL(Tf) -f (](Tg)) ^ ail(Tf) -f ftl(Tg) ^ 
"h Thus, is a linear functional on N. Furthermore, 

|/j,(/)| = \l(TJ)\ < ||/|| • IITy'll < (||/|| • ||7||) ll/ll, and so /y. is bounded; in 
fact, ||/y.|| < liril • ||/||. Also, it is obvious that if / and / are members of 
N* and if a and are any scalars, then (a/ + = a/r “F The 

results of the last two sentences can be summed up by saying that the 
mapping of N* into N* defined by / /y. is an operator. We denote this 

operator T* and call it the adjoint of T. Note carefully that, while T is an 
operator on A, 7* is an operator on N*. 

Of course, one can now consider (7*)*, which is an operator on 
jV**. We have seen in Chapter 4 that N £ yV**, so that (7*)*(/) is 
well-defined for every /e A. As one might expect, (T*)*(f ) = Tf (cf. 
Exercise 1), and so (7*)* is an extension of 7 (In general, (7*)* is a 
proper extension of 7, for N is usually a proper subset of A’*'*.) 

Theorem I : {a) (T, + 73 )* = 7f + 7*; 

(h) (olT)* = a7*; 

(c) (/)♦=/*; 

id) iT,T,)* = 7*7*. 

Proof: Trivial. Of course, (c) means that the identity on N has as 
its adjoint the identity on A*. 

Theorem 2: ||7*i| = ||7||. 

Proof: The inequality l|/y|| < ||7|| * ||/|1, which was established in 
the first paragraph of this section, shows that ||7*1| < |171|. As lor the 
reverse inequality, it is obviously true if 11711 =0. Therefore we may 
confine attention to the case ||7|| > 0, and by homogeneity it sulfices to 
prove that ||7*|| > 1 when ||7|| = 1. Given any positive number a < 1 
(say a = 0.999), we can find a unit vector / such that \\Tf\\ > a. Let Tf 
temporarily be called g. By the corollary to the Hahn-Banach theorem, 
there exists a linear functional / such that 11/|1 = I and |/(^)| = llg^ll* Thus, 
a = llgll = |/(7/)| - |(7*/)(/)| < 117*/|| • ll/ll - ||7*/|| < 117*11 • 1|/|1 = 
II 7* II . Hence, ||7*i| > a, and since a can be chosen arbitrarily close to 1, 
we obtain the desired inequality, ||7*|| > 1. This completes the proof. 

When we take account of Theorem 3~1 of Chapter 5, we see that the 
preceding discussion of the adjoint operator assumes an especially simple 



ISO OKRATORS 


and elegant form in a Hilbert space H. Every vector ^ in // determines a 
member of //*, and every member of //* is determined by a (unique) 
member of H, Furthermore, if for clarity we denote the bounded linear 
functional on H determined by by the symbol Ig, we have ^ = /^ + 4 
and ||/j,|| == ||g||. Thus, the correspondence g<-^ Ig between //and H* is 
additive and norm-preserving. However, a small complication arises 
when we observe the effect of multiplying;^ by a scalar a: 

U/)-(/.a^) = S(/,^) = a/„(/). 

Thus, instead of obtaining l^g = a/^, we have l^y = a/y Therefore, the 
mapping g Ig is not linear; it is anti-linear, or conjugate-linear. (This 
complication does not arise, of course, when we deal with a real Hilbert 
space.) 

Since H and //* are essentially the same spaces, it is rather natural to 
think of r* as an operator defined on H rather than on //*. To be more 
precise, we reason in the following manner, which enables us to disregard 
completely the concept of the dual space N* of a normed linear space N: 
Let T be an operator on H and let g be a fixed vector in //, while the 
vector / varies over all of H. Then the expression {Tf,g) obviously 
defines a linear functional on H\ furthermore, this functional is bounded, 
\{Tf,g)\ < llgll • \\Tf\\ < (llgll • II Til) ll/ll. By the Riesz representation 
theorem, there exists a unique vector gj, such that {Tf,g) = if^gr) 
all /, Now let us consider how gj, behaves as g varies freely over H. 
Clearly, {Tf,g + /i) = {Tf,g) + (T/, h) = (/,g^) + (/, /;,,) = {f,g^ -f h^), 
and so {g -f h)^ = Also, (7/, ag) = a(77, g) = a(/, g^) = 

oLgfi^), and so (ag)y = agy. Hence, the mapping g g^ is linear {not 
anti-linear). We now proceed to show that this mapping is also bounded — 
i.e., that there exists a number C such that ||gy|| < C ||g|| for all vectors g. 
We simply replace / in the equality {Tf,g) = {f,gT) t>y gy, and so we 
obtain (Tgy, g) = (gy, gy); by the Schwarz inequality we obtain 
WgTV < WTgrW • ll^ll < liril • llgyll • llgll, and then by dividing by ||gy|l 
we obtain ||gy|| < ||r|| • ||g||. (If gr = o the division is not legitimate, 
but the last inequality is trivially true in this case.) Hence, the linear map- 
ping g->gT hsis been shown to be bounded, with norm not exceeding 
lirii. 

The mappingg — ► gy is henceforth denoted T*, and called the adjoint 
of r. This is not quite correct, for T* should, according to our earlier 
definition, be an operator on //♦, which is closely related to, but not 
isomorphic to, H (because of the unpleasant appearance of the conjugate 
in the relation a(/, gy) = (/, agy)). Nevertheless, when working in 
Hilbert spaces we consider T* to operate on H, not on H*. (Of course, this 
cannot be done on an arbitrary normed linear space.) 

Thus, if ris an operator on H, T* is the operator on //defined by the 
condition: (Tf,g) — (/, T*g) for all vectors/ and g. We have already 



3. THE INVERSE OF AN OPERATOR 131 


shown that ||r*l! < Uni; however, it is almost trivial (cf. Exercise 2) 
that the adjoint of T* (which certainly exists) must coincide with 7, and so, 
upon replacing T in the last inequality by 7*, we obtain I17|| < |17*||, 
and so we have shown that the equality ||7|| = ||7*1| must hold. 

Of course, this last result is in agreement with the result previously 
established for an operator defined on any normed linear space and its 
adjoint operator (defined on the dual of the given normed linear space). 
However, it is of interest to note that the argument presented in the pre- 
ceding paragraph enables us to avoid the use of the Hahn-Banach theorem. 

Returning momentarily to Theorem 1, we see that (h) must be 
modified, in the present context, as follows: (xT)* = a7; parts (a) and 
(d) remain correct as stated, while part (c) merely assumes the form 
/* = /. 

We conclude this section with the following easy but important 
result. 

Theorem 3: If T is any operator on a Hilbert space //, then 

||7*7|| = 1177*11 = 11^11“. 

Proof: For any unit vector /‘we have l|7*7yi| < ||7*ll • ||7'/j| < 
lir*l| • 11711 • ll/ll = 11711**^ • 11/11 =’l|7|r-; hence \\T*T\\ < 1|71|^ On the 
other hand, for any unit vector/ we also have ||7*7/|| = ||7*77 || * ||/1| > 
1(7*7/, /)! = \(TJ\Tf)\ = IITyp. Taking the supremum of the end terms 
over all unit vectors, we obtain 1I7*71| > UTir-^. Thus, |17*7|1 = \\T\[^. 
Replacing 7 by 7*, we obtain 1| 77*11 = I17"*l|“ = H 7|l“. This completes 
the proof. 

Exercises 

1. Prove that (7*)*(/) = .7/ for every vector/ in /V. 

2. Prove directly (without reference to dual spaces) that (7*)* = 7 

for any operator 7 on a Hilbert space. 


§3. THE INVERSE OF AN OPERATOR 

Let A be any non-empty set whatsoever and let 7 be a mapping of A 
into A. If T(A) = A (i.e., if the mapping 7 is onto) and if T(x) # 7(j) 
whenever .v 9 ^ then the inverse mapping P ^ is defined. If 7“^ is 
temporarily denoted S, we readily see that ST = I and TS = /, where /, 
of course, denotes the identity mapping on A, Conversely, it is not 
difficult to prove (cf. Exercise 1) that if S and 7 are mappings of A into A 
which satisfy both of the preceding equalities, then each of them is a 
one-to-one mapping of A onto A and that each of them is the inverse of 



132 OPERATORS 


the other. One might expect that either of the preceding equalities implies 
the other, but this is not so. For example, let A be the set of positive 
integers, let S be the mapping which sends every member n of A into 
A 7 -H 1 , and let T send every integer n exceeding 1 into « — 1 , while r(l) = 
7. Then it is easily seen that TS ==^ I and ST 5 *^ I. In fact, the mapping 
T is not even one-to-one, since r(l) = r( 8 ); on the other hand, 5, 
while one-to-one, is not onto, since there exists no member n of A such 
that S(n) = 1 . 

If ST = / we say that S is a left inverse of T and T is a right inverse 
of S. Thus, T ~^ exists and equals S' iff *S is both a right and a left inverse 
of T, Now suppose that T has a left inverse S and a right inverse S\ are 
S and S necessarily the same mapping? The answer is in the affirmative, 
for we can argue as follows: S = IS = (ST)S S{TS) = SI = S. 
(Note that we exploit the fact that composition of mappings is associative.) 
Thus, a mapping which possesses a left inverse and a right inverse possesses 
an inverse, and all three coincide. 

Now let us turn to the case that the set /I is a linear space V (not 
necessarily normed), and let Tbe a linear transformation on V. In order 
for T to be invertible it must, as remarked previously, be onto and one- 
to-one, but the linearity of T enables us to restate the second condition 
as follows: Tfj^o whenever f ^ o (for, \f g 9 ^ //, then T(g) — T(h) = 
T(g — h) # o, and so T(g) # T(h)). Thus a linear transformation T on 
V is invertible iff T(V) = V and Tf = 0 holds only when f = o. It is then 
a triviality to show that the mapping T'^ is also linear. 

However, if F is a normed linear space, N, and if T is a bounded 
linear transformation which is invertible, we may ask whether T~^ is 
necessarily bounded. The answer is in the negative, as is shown by the 
following example: let N consist of all sequences of scalars containing 
only a finite number of non-zero entries, let ||(ai, ag, ag, . . .)|| = 
maxi,^;„<^ ,,, |a„|, and let r(ai, ag, ag, . . .) = (ai, ag/Z, ag/B, . . .). It is 
obvious that Tis linear, one-to-one, onto, and bounded; in fact, HrH = 1 . 
However, let a be the vector containing 1 in the ^-th place and zeros 
everywhere else. Then it is immediately evident that ||fll| = 1 , T~^a = ka, 
||r~ia|| = k ||a|| = k, and so ||r““^|| > k. Since k can be chosen as large 
as we wish, l|r“^|| is not finite — i.e., T~^ is a linear transformation but 
not an operator. 

This unpleasant state of affairs disappears if instead of considering a 
normed linear space N we consider a Banach space B. (Note that the 
space considered in the preceding paragraph is not complete.) We shall 
first prove the following remarkable theorem and then obtain an easy 
corollary which, in turn, leads directly to the desired result. 

Theorem I: Let T be an operator (not necessarily one-to-one) 
mapping the Banach space B onto B. Then for any positive number c, 
^^^(o)) contains a neighborhood of o. 



3. THE INVERSE OF AN OPERATOR III 


Proof: (a) For each positive integer n let denote r(5'„(o)). By 
hypothesis, Un-i ~ Since B is complete, the Baire category 
theorem guarantees that at least one say is not nowhere dense. 
Therefore, there exists a vector a and a positive number d such that Si^ is 
dense in Ss(a). Although a may fail to belong to it docs belong to 
some say a e S^—Lc., a = Th, where 1|^|| < m. 

(^) Now we shall show that ^ is dense in some neighborhood of o. 
By the triangle inequality, ||/|| < k implies that ||/-~ /)|| < -f w, and 
so contains the set of all vectors expressible as T(f — h) for some 
vector / of norm <A:. By linearity, this set is precisely the set obtained by 
translating 5^ by —a, and this latter set is certainly dense in S^io). Thus, 
Sfc+rn dense in some neighborhood of o — in fact, is dense in S^io). 

(c) By the homogeneity property of a linear transformation, St is 

dense in where 6 = dl(k -f m). 

(d) Now we shall show that ^ S^{o); i.e., for any vector y in 
Ss(o) we shall prove that there exists a vector g in S^io) such that Tg = y. 
Since Si is dense in we can find a vector in 5, such that \\yi — jl| < 
§12 and a vector Xj in 5'i(o) such that Txi = y^ Thus, \\y — Txi\\ < f5/2, 
or \\2y — 2Txi\\ < §. Repeating this argument, we see that we can find 
a vector in 51(0) such that \\2y — 27x1 — ^^.>11 < §12, or 

||4v — 47x1 — 27x211 < §^ 


Similarly, we can find a vector X3 in Si(o) such that 
II - 87xi - 47x2 ~ 27x311 < ^5, 


and by induction we see that we can find vectors X4, X5, ... in Si(o) such 
that - 2”7xi - 2”-i7xo - • • • - 2^TxJ < or ||/ - TgJ < §I2'\ 
where 



4- + 


X 


n 


2«- 


For m> n, wc have 


gn 


2” 


4- 


2m— I ’ 


and so 


ll^m ^nll ^ 


2 " 


+ 


2m-l "^2" 


_L ^ y i = _L 

2"i-i 2 * 2"'“‘ 


Hence, the g's form a Cauchy sequence, and so there exists a vector g such 
that - ^J| ->0, Then, \\y - TgJ = Uy - Tg) + T(g - gJH < ^/2", 
and so ||j - Tg\\ < ||r(g - gJII + IKt “ W “ ^n)ll < 

img - gJII + ^/2” < Ill’ll • \\g - gj + Sl2”. Letting n increase without 



134 OPERATORS 


bound, we obtoin Hj — Tg\\ —G,oty = Tg. Now, 

llgll =lim ||g„|| < lixjl + i^ + %!^ + -- -<l + 4 + ^ + -- - = 2. 

w~»co 2 2 2 

Hence, g e *^ 2 ( 0 ), and so e § 2 ^ Thus, we have established the inclusion 
S 2 5 S^io). (Note that the boundedness of T is used only once, in 
claiming that ||r(g — g„)|| -^0, but this one point in the argument is 
vital.) 

(e) Now, by homogeneity, Si 3 55 ) 2 ( 0 ), and, again by homogeneity, 
the image of 5 ^( 0 ) contains Sjj^(o), which is a neighborhood of 0 . The 
proof is thus complete. 

Corollary (Open Mapping Theorem): If T satisfies the hypothe- 
ses of Theorem 1 , it is an open mapping — that is, the image of an open set is 
open. 

Proof: Let G be any open set in B, let G = T(G), and let g eG. 
Then there exists a vector in G such that Tg = g. Since G is open, 
S^ig) 9 G for sufficiently small €. Hence, for any vector h of norm <6, 

+ A) 6 (7, or Tg TheG, or g The G. By Theorem 1 , as /z 
varies over S^{o), Th describes a set of vectors containing a neighborhood 
of 0 , and so ^ + Th describes a set containing a neighborhood of g. Thus, 
g has a neighborhood which is contained in G — in other words, g is an 
inner point of G. Thus G is open, and so T is an open mapping. 

We now obtain our objective with the following corollary. 

Corollary: Let the operator T furnish a one-to-one mapping of the 
Banach space B onto B. Then the inverse mapping T~ ^ is also an operator. 

Proof: We have already noted that T~^ is linear. Since T is 
bounded, it is continuous, and so the preceding corollary guarantees that 
T is an open mapping. From §1-9, an open, continuous, one-to-one 
mapping of a metric space onto itself (or onto another metric space) 
possesses a continuous inverse. Hence T~^ is continuous; but for linear 
transformations continuity and boundedness are equivalent, and so T~^ 
is an operator. 

As the final item in this section, we state a modified form of Theorem 
2-1 of Chapter 5 ; we dispense with the proof, since it is virtually identical 
with the one given for the aforementioned theorem. 

Theorem 2 (Banach-Steinhaus): Let {T^^ be a collection of 
operators on a Banach space i?, and suppose that for each vector f the 
numbers { || J*,/ 11 } are bounded. Then the numbers { || T^ || } are bounded. 



4. SEQUENCES OF OPEHATORS I3S 


Exercise 

1. Prove the assertion made in the fourth sentence of the first para- 
graph of this section. 


§4. SEQUENCES OF OPERATORS 

Definition 1-5 and Theorem I -5 show that the collection of all 
operators on a normed linear space N constitute a normed linear space, 
which we denote (The letter serves to emphasi/x* that we arc 

dealing only with hounded transformations.) It is, therefore, possible 
to introduce into the concept of convergence. In fact, three 

different definitions of convergence arise quite naturally. The most 
obvious definition is that which comes directly from the definition of 
convergence in any metric space. We shall say that the sequence 
{Ti, To, Tg, . . .} converges uniformly, or in norm, to the operator T if 
II 7" — T,J| 0, and we shall employ the notation 7\ rather than 

T,^ -> r, as might be expected. 

Theorem i : If N is complete, so is d^{N). 

Proof; Let {Tj, Tg, . . .} be any Cauchy sequence in 
Then for any vector /in N the sequence {T, f, 7\f, Tf\ . . .} is also Cauchy, 
for lir,/- r,/|| < lir, ~ TJ • \\f\\ -> 0 . since “/V is complete, there 
exists a vector, which we denote 7/, such that \\Tf — TfJ > 0. Clearly 
T{yf + = cuTf -f for all a, /L /, and y, so that T is a linear 

transformation. Furthermore, lTf\\ = lim„ |17"„/11 < (lim„ liT,J|) ‘ 
11/11, and so T is bounded, and hence a member of To show that 

T we argue as follows. Given e > 0, there exists an index T/(t) 
such that ||r,„ — 7’,J| < e whenever m and n both exceed M(e). For any 
such m and n and for any vector / we may write 11(7’— 7/)/ll = 
11(7 - 7J/-f (7, - 7J/II < 11(7 - 7,)/|| + 117, - 7J| • ||/li < 
11(7— 7J/11 -f € 11/ !|. Letting // increase without bound while holding 
m fixed, we obtain ||(7 — 7,J/!I < € 1|/|1, and hence ||7— 7,J| < e 
whenever m > M(e). Therefore, yA{N) is indeed complete. 

Turning again to a sequence {Tj, To, Tg, . . .} of operators on the 
normed linear space N we note that it may happen that the sequence ol 
vectors {7,/, Tof, T^f, . . .} converges for every vector/ and yet there may 
fail to exist an operator 7 such that 7, 7. We shall illustrate this 

possibility in the case of a complete normed linear space and then turn to 
the consideration of an additional complication which can arise when N 
is not complete. 

Let us consider the Hilbert space 7/ and for any vector /, consisting 
of the sequence (ai, ag, ag, a4, . . .) of scalars, let 7,/ — (0, ag, ag, . . .), 



m OPERATORS 


( 0 , 0 , a3, a^, . . .), T,f = ( 0 , 0 , 0 , a4 , . . .), etc. Clearly, ||r„/l| -> 0 , 
and so T^f o =: Of Thus, converges to the operator O in the sense 
that II (r„ — 0)/l| 0 for every vector /, but it is not true that =>0, 

for II — 0|| = lir^JI, and each operator obviously has unit norm. 
Thus, we are led to the following definition: The sequence of operators 
Ti, T2, Ta, . . . on any normed linear space is said to converge strongly, or 
pointwise, if for every vector / the limit lim„^^ T^f exists. Since, obviously, 
the equality r„(a/ + = a r„/+ holds 

for any choice of a, and g, the strongly convergent sequence deter- 
mines a linear transformation T such that 7 „/ Tf for every vector /, 
and we write -> T. 

However, the question arises whether T is necessarily bounded. 
Theorem 3-2 immediately provides an affirmative answer if N is complete. 
On the other hand, the following simple example shows that T may be 
unbounded if N is not complete. Let N consist of those sequences of 
scalars in which only a finite number of the scalar components differ from 
zero, let the norm be defined as in P, and let the vectors r„/be defined as 
follows for any vector / = (a^, ag, ag, a4, . . .): 

TJ = (ai, 2a2, ag, a4, . . .). 

Tif = (a„ 2a2, Sag, . . .), 

Tzf = (ai, 2a2, 3a3, 4a4, . . .), and so forth. 

Clearly, T„/“> /?4, . . .), where = koi^ for all indices k. 

The linear transformation T thus defined on N is clearly unbounded. 

We now turn to the third type of convergence, and we begin with a 
simple illustration. Once again let N be taken as 1 ^. lit Tj/ = 
(0, ocj, ag, 0^3, . . .), T^f = (0, 0, a^, a2, 0C3, . . .), T^f = (0, 0, 0, a^, aa, 
ag, . . .), and so forth, where, as before, /=(«!, ag, ag, . . .). It is easily 
seen that the sequence {T,,} does not converge in either of the two senses 
defined previously, for if f is taken as the vector (1, 0, 0, 0, . . .) it is 
obvious that ||r„/ — T^fW = 2^^^ whenever n ^ m, and so the sequence 
{Tnf} does not converge; hence the sequence {TJ does not converge 
strongly, and so it certainly does not converge uniformly. Nevertheless, a 
certain type of convergence is involved here. Let / denote any bounded 
linear functional on l^. Then we know that there exists a unique vector g, 
given by a sequence (y^, 72» ya? • • •)» such that 1 (f) = (fg) and hence 
l{Tnf) = {Tnf,g) = 2 £.i «-kYn+ic- Employing the Schwarz inequality we 
obtain |/(r„/)P < Ini*}- Letting n increase without 

bound, we conclude that liT^f) -► 0 = l(Of), 

We are thus led to the following definition : The sequence of operators 
{TJ is said to converge weakly to the operator T if for every vector f inN 
and every member / of the limiting relation l(T^f) l(Tf) holds, and 
we write T^ T Having defined three kinds of convergence of sequences 



5. HERMITIAN OPERATORS 07 


of operators, we can analogously define three kinds of convergence of 
series of operators; we shall say that the series S; -f + ^3 -f - • of 
operators converges to the operator 5 (uniformly, strongly, weakly) if 
the sequence Si, Si -f 52 , Si -f ^2 -f ^ 3 , . . . of partial sums converges to 
S (uniformly, strongly, weakly). In particular, we shall employ several 
times the following simple but important theorem, whose proof is left as 
Exercise 1. 

Theorem 2: If the numerical series l|5il! -f Ij^ali + * “ is con- 
vergent, where the Sfs are operators on a complete normed linear space, 
then the series 5^ 4* ^2 -f ^3 -{- * * • converges uniformly to an operator S. 
The sum S of the series is not affected by rearrangement of the terms of the 
series. 

Exercises 

1 . Prove Theorem 2. 

2. Consider the following theorem: If the sequence of operators {SJ 

converges to the operator S and the sequence of operators 
converges to the operator T, then the sequence of oper- 
ators {S,ff„} converges to the operator ST. This is not well 
formulated, since the kind of convergence is not specified. 
First, show that the theorem is true if the word “uniformly" is 
inserted after each appearance of the word “converges." 
Then determine, in as many cases as you can, the truth or 
falsity of the other 26 possible interpretations of this “theorem." 

3. Prove the truth or falsity of the following theorems, where the 

operators T,, and Tare defined on a normed linear space: 

(a) If T^ => T, then T* T*, 

(h) If r,, T, then T* - > 

(c) If ‘ T, then - T*. 

§5. HERMITIAN OPERATORS 

In this section it is especially important to note that we are dealing 
with complex scalars; in particular. Theorem I and its corollary are 
false in the case of real Hilbert spaces. (Cf. Exercise 1.) A fixed Hilbert 
space H is under consideration. 

Theorem I : Let T be an operator on H. If (Tff) = 0 for every 
vector f in H, then T = O. 

Proof: Take any two vectors /, g. By hypothesis, 0 = (T(f + g), 
if + g)) = (Tf, /) + (Tg, g) + iTf. g) + iTgJ) = 0 + 0 + (7/, g) + 
iTg,f). Thus, (Tfg) + {Tg,f) replacing / by if, we obtain 



13$ OPERATOIIS 


(3y, g) •— (7j^,/) = 0. Adding these last two equations, we obtain 
{Tfy g) « 0; choosing/ at pleasure and then choosing g as Tf, we obtain 
(7/, 7/) == 0, or 7/ = 0 . Thus, T = a 

Corollary: If (Tif f) = {T 2 J, f) for all vectors f then 
Proof: Rewrite the preceding equality in the form 


-- m.f) = 0 


and apply the preceding theorem. 

Definition I : An operator T is said to he hermitian, or self-adjoint, 
ifT = r*. {Trivial examples are provided by the operators O and I; later 
we shall encounter less obvious examples.) 

Theorem 2; {a) The sum of any finite number of hermitian operators 

is hermitian. 

{b) The limit {weak, strong, or uniform) of hermitian operators is 
hermitian. 

{c) A real scalar multiple of a hermitian operator is hermitian. 

{d) The product of two hermitian operators is hermitian iff the given 
operators commute. 

Proof: Left as Exercise 2. 

Theorem 3: The operator T is hermitian iff {Tff) is real for all 
vectors f. 

Proof: (tz) If T is hermitian, then, for every vector /, {Tff) — 
(A Tf) ~ iTf,f). Hence (7/, /) equals its conjugate, and so it is real. 

{b) If {Tff) is always real, then {Tff) = (Tff) == (/, Tff) = 
{Tff , /). By the corollary to Theorem \ ,T = T*, and so T is hermitian. 

Theorem 4: If T is hermitian, so are all the operators 7^, T^, .... 

Proof: Since T commutes with itself, part (d) of Theorem 2 
guarantees that T^ is hermitian; since T and commute, this same 
result guarantees that T^ is hermitian. Since this argument can be 
repeated indefinitely, we obtain the desired result. 

Theorem 5: If T is hermitian and ^ O, then no power of T is O. 
{That is, a hermitian operator cannot be nilpotent.) 



5. HERMITIAN OFERATORS m 


Proof: By hypothesis there exists a vector / such that Tf 9^ o. 
Hence 0 < (77; TJ) = (7y,/), and so Similarly, T^f ^ 

^ 0 ^ j’sy' 0 ^ gQ forth. If T^'f = 0 for some positive integer 
m, we could choose the integer k so large that 2' > m, and we would 
obtain f = ~^U) =0, contradicting the result ob- 

tained in the previous sentence. 

An alternative proof goes as follows: For any operator F. hermitian 
or not, we have seen that \\T*T\\ = \\T\p. Thus, if T is hermitian, we 
obtain ||P!| = HFH^, and so T- 9 ^ O; similarly. F* 9 ^ O. # O, and 
so forth. (Cf. Exercise 3.) 

Theorem 6: JfTis hermitian. ||F|| = sup,,^., j |('F/; f)\. 

Proof: (^7) Denote the right side of the equality by //. Then for 
any unit vector /, |(F/,/)l < ||F/|! • ||/|| < ||F|1 • ii/|| • ||/|1 = 1|F||. 

Letting /vary freely under the sole restriction |1/|| = 1, we obtain fi < 
II F|| . (Note that this half of the proof does not use the hermitian character 
of F.) 

(h) Let /be any vector and let A be any real positive number. Taking 
account of the delinition of and of its obvious consequence l{Fi;",,t')| < 
we obtain 

mxf± k Xf± x-^Tf)\ < Uf± X-^rjV 

and so (using the triangle inequality for real numbers) 

|(F(A/+ A ^F/ A/-f X-^Tf) - (F(A/- A '^F/), A/ - A ^Tj)\ 

<//{||;/-f A“^F/P + ||A/-~ A '^7/l|2}. 


A routine calculation shows that the left side is simply 4 \\TJ\\^. (In this 
calculation the hermitian character of F is used.) The right side reduces, 
by the parallelogram law, to 2//(A^ 1I/P 4- A ^ |17y p). Thus, for any 
unit vector / and any positive number A we obtain 

4 llF/r < 2/^(A2 + A-2 l|F/r). 

A trivial calculation shows that the right side, considered as a function of 
A, is minimized by choosing A = I1 Since the left side of the 

inequality is independent of A, we obtain 4 \\Tf\\^ < 4fi \\Tf\\, Dividing 
by 4 117/11 we obtain \\Tf\\ < //, and so \\T\\ < fx\ combining this result 
with that obtained in part (a), we conclude that l|Fj| = fi. (The division 
by 4 \\Tf\\ is not legitimate if \\Tf\\ =0, but in this case the desired 
inequality, namely \\Tf\\ < /i, is trivially true.) 



140 OPERATORS 


Corollary; Ij T is hermitian, ||r|| == max {|w|, 1M|}, where 

m = inf (TfJhM ^ sup (T/,/). 

I:/:! 1 

Proof: Obvious from Theorem 6. 

Definition 2; A hermitian operator T is said to he positive if 
(Tf, f) > 0 for all vectors f (Examples: T = O, T = I.) Obviously, 
II Til ==: M in this case, where M is defined in the preceding corollary. 

This definition suggests the possibility of introducing an ordering 
into the family of hermitian operators, analogous to the relation < which 
exists in R. Actually, only a partial ordering is provided by the following 
definition. (Cf. Exercise 4.) 

Definition 3: IfT^ and Tg are hermitian, then we write iff 

Ti — is positive. (In particular, T > O iff T is positive.) 

Theorem 7 : //Tj > To and > T 4 and if cl and are non-negative 
real numbers, then olT^ 4- > a To -h Also, ml < T < MJ, where 

m and M are the numbers defined in the corollary to Theorem 6. (Cf 
Exercise 5.) 

Proof: Trivial. 

It should be noted that the product of positive operators need not be 
positive, for the operators may fail to commute, in which case their 
product is not hermitian. However, it is an interesting (and non-trivial) 
fact that the product of commuting positive operators is indeed positive. 
This will be proven in Theorem 9; first we need the following interesting 
theorem, however. 

Theorem 8 : //' T > O, there exists a positive hermitian operator S 

satisfying = T. (That is, every positive operator has at least one 
positive square root.) 

Proof: Trivial if T = O. If T # O, we may confine attention to 
the case that 1|T11 == 1 (for otherwise we can work with (1/||T|1)T, whose 
norm is 1, and multiply the square root obtained in this case by l|r|l^^'^). 
Then, clearly, 0<r< / and 0</-T</. 

Now we recall from calculus the Taylor expansion 

(1 — Uy^^ = 1 — CjM — C2ff — • • • , 

where the c’s are positive. (Their exact values can be written down, but 



S. HERMITIAN OPERATORS Ml 


are of no importance.) This expansion is easily shown to be valid for 
-1 < M < 1, but we shall show that it is, in fact, also valid for « = 1. 
(Also for M = — 1 , but this is of no importance here.) Since the c’s are ail 
positive, we can speak of their sum, if we allow + oo. Now, if the sum 
were + oo, or even any finite number > 1 , a pnite number of the c s would 
add up to more than 1 , say c, + c, + • • ■ + c-„ > 1 . phen, by continuUy, 
ciu + CM^ + ■ • • + c'„m" >1 for M sufficiently close to I , and so we 
would have c,u + c.u- + • • • + + ■ • • > i. But this would say 

that 1 — CiM — • is negative when u is sufficiently close to 1 , and 

so we obtain the absurd inequality (1 - < 0 for u close to 1. Thus, 

the series C] + C 2 + • • ■ + c,, + • • • adds up to 1 or less. But if the sum 
were less than one, Cyii + + ■ ■ ■ + c,,!/'' + ■ ■ ■ would be bounded 

away from 1 for 0 < h < 1 , and so (I — would not approach zero 
as w 1 ; this is also absurd. Hence, r, + <•„ + • • • = l , and so the 
series expansion given for (1 - uy'- is correct for i/ = 1. Now, let us 
write out the series 

/ - c,(; - T) - c..(/ - ry . 


The partial sums (T„ = / - c-,(/ - T) - t^l - Tf <■„(/ - T)") 

are a sequence of hermitian (why?) operators, furthermore, if itt > n, 

\\T,n - TJI = \\c„ JJ - 7)'“' + c„,,(/ - 7’)"'- + • • • + cjl - rn 

m m rn 

< 2 \MI - 7')"''ll = I rV 11/ - 7T" < X f.- 

k -n-] l k n \ I 

Since convergent, the sum made <e by 

choosing n > A^(e). Hence, the sequence of partial sums Jj, • - • 
converges (in norm) to a hermitian operator S. Thus, wc may write 

S = / - c,(J - 7) -- c,(I ^ 7)2 ^ . 

Now, if we square the series expansion I — Cji/ — — • • • we have 

1 — u, and similarly we find that = I (I — T) = T. F-urthermore, 
since Cj 4- Ca -f- * * * -H -f • • * = 1 , it immediately follows that 

||Ci(/-7)4' < U 

and so O < c^{I - 7) -f c^il - 7)^ 4- * • • < /. Therefore, O <1 - 
Ci{l — 7) — C 2(7 — 7)2 — • ’ • = 5 < /, and so S is a positive operator. 

Theorem 9: //A and B are positive commuting operators, then AB 

is positive. 



142 OFERATOIIS 


Proof: The hermitian character of AB is assured by Theorem 2. 
We construct positive square roots and 5*2 of A and B, respectively, in 
the manner described in the preceding proof. Since B commutes with A, 
B commutes with all polynomials in A; this suggests that B commutes 
with Si, We demonstrate this as follows: Let Qi, 02» Qz ^ ... be the 
partial sums of the infinite series representing Si. Then BSi — SiB = 
BQn “H B(Si ^ QJ ~ (Si - QJB - Q,B = B(Si ~ QJ - (Si - Q„)B. 
Hence, 

\\BSi ^ SiB\\ < \\B(Si ~ QM + 11(^1 ~ Qn)B\\ 

< 1I5II • 11^1 ~ QnW + II^I ~ QnW * ll^ll - 2 IIBII • ~ QJ 0, 


and so BSi — SiB = O, or BSi = SiB. 

Similarly, it follows that Si commutes with S 2 . Therefore, S 1 S 2 is 
hermitian; furthermore, (S 1 S 2 Y = S 1 S 2 S 1 S 2 == S 1 S 1 S 2 S 2 = S^Sl = AB. 
For any vector /, (ABf, f) = ((SiS 2 )% f) = (SiS./, S', 5 ' 2 /) > 0, and so 
AB is positive. 

Theorem 10: A positive operator T possesses a unique positive 
square root. 

Proof: Let S be the square root of T constructed in the proof of 
Theorem 8, and suppose that Si is any positive square root of J; we shall 
show that S = 5,. Since SiT = == (Sl)Si = J'S',, we see that Si 

commutes with T. By the argument employed in the preceding proof, we 
see that Si commutes with S. Now we construct positive square roots of 
S and Si in the manner employed in the proof of Theorem 8; we call 
these square roots R and respectively. For any vector/, we obtain 
(using the fact that and 5 commute and also the fact that all operators 
involved are hermitian): 

ms - Si)fr -f \\Ri(S - Si)JT = (RHS ~ Si)f. (S - Si)f) 

+ iRl(S - S,)f, (S - S,)/) 

= (5(5 - 5,)/, (5 - 5,)/) 

+ (5,(5 - 5,)/, (5 - 5,)/) 

= ((5 + 5,)(5 - 5,)/, (5 - 5,)/) 

= ((5^ - 5f)/, (5 - 5,)/) 

= ((r - T)f, (5 - 5,)/) 

= (o, (5 - 5,)/) = 0. 

Thus RiS - 50 = O = RiiS - 5,), and so /?*(5 - 50 = O = ^'(5 - 50; 



4. PROJECTIONS 143 


this may be rewritten in the form S{S - S,) = Si(S ~ or (S - » 

O, Since, as we have seen, a hermitian operator cannot be nilpotent, it 
follows that 5 — = (9, or S = S^. 

Having established this uniqueness result, we are now Justified in 
referring to the positive square root of a given positive operator T, which 
we shall denote If we take any hermitian operator A, \$ positive, 
and so is the unique positive solution of the operator equality 

= A^. For obvious reasons, we denote this operator as the absolute 
value of M| = (. 42 ) 1/2 

We conclude this section by slating without proof the following 
theorem, which serves to emphasize the similarity between the hermitian 
operators and the real numbers when considered as members of i^(H) 
and d, respectively. 

Theorem 1 1 ; Given any hermitian operator A, the operators 
I (Ml tt A) are positive, they commute, and their product is the zero 
operator. {For obvious reasons, we call J(M1 + A) and ^(|/1| — A) the 
positive part and negative part of A, respectively, and we denote them A'^ 
and A~; thus, A = A^ — A .) 

Exercises 

1. Prove that Theorem 1 and its corollary are false in a real Hilbert 

space. 

2. Prove Theorem 2. 

3. Prove that the equality |ir"'l| = | 1 T 1 |"‘ holds for any positive 

integer w if J is hermitian. 

4. Show that two hermitian operators A, B may be incomparable— 

i.e., neither A ^ B nor B > A may be true. 

5. Prove that if and > T^, then == T^. 

6 . Prove the generalized Schwarz inequality: If 7* is a positive 

operator, then |(7/,g)P < {Tff){Tg,g) for all vectors/and 
g. (Thus, if {Tf, f) = 0, it follows that {Tf,g) = 0 for all 
vectors g; in particular, {Tf, Tj) = 0, or Tf ^ 0 . Theretore, 
a positive operator cannot map a vector f into a vector 
orthogonal to / except by annihilating/.) 

§6. PROJECTIONS 

Although the concept of a projection is of significance in Banach 
spaces, and even in linear spaces which are not provided with a norm, we 
shall confine attention entirely to Hilbert spaces. 



144 OPERATOIIS 


Definition I : Let M be any suhspace of H and let each vector f of 
H be expressed {in accordance with the projection theorem) in the form 
f g ^ where g G M and he M^. We call g the projection {more 
strictly y the orthogonal projection) of f on M, and we write g = 
{Similarly y of course y h = P^^^fy where TV = A/^.) 

Theorem I : Pj^ is an operator y its norm is 1 except when M = {o}, 
and it is hermitian, positive y and idempotent {i.e.y ^ m)' 

Proof: If / = ^ -f is the break-up of f then of = (a^) -f (a/z) is 
a legitimate break-up of a/; by uniqueness, it is the only break-up of a/, 
and so P^J{ojj = Similarly, -f/a) = P^fi + Thus, 

Pj,i is linear. Since H/P - \\gr + l|//P, we have ||^|| < ||/1|, or ||P,,,/11 < 
ll/ll. Thus, Py is bounded; in fact, \\Pm\\ ^ ^ contains a non-zero 

vector /, then for this particular / we have the break-up / = / -f o, and so 
= il/IK showing that ||P^v.;|| > 1. Thus, ||P^/|| is exactly 1 except 
when M = {o}, in which case, of course, P^; = O. Let = g^ -h hi and 
/?2. Then(P;j/i,/2)-(/,,P,,^/2) = (gi,g2-f /7,)-(g, +/?l,^2) = 
(,^ 1 ' g 2 ) + (gi^ ^h) - igu g 2 ) - Oh^ g 2 ) = h 2 ) - (//,, g 2 ) = 0, since 

(gi, hs) = 0 and (/zi, ^ 2 ) = 0. Thus, for all vectors f and /g, (P,u/i,/ 2 ) = 
(/i, P^if^). and so Plj^ = P^f, Finally, for any /, we have the break-up 
f :=z g ^ hy where g = P^//, and for g we obviously have the break-up 
g =g^ Oy so that g = P^^jg. Thus, P^y/ = PmIPmD == P M^f Hence, 
P\j = Pj^f. Since P is a positive operator, so is P^/ {because P^ = 
Pm% 

Theorem 2: An operator P is a projection iff it is hermit ian and 
idempotent. 

Proof: We already have half the result — if P is a projection, it must 
be hermitian and idempotent. Now suppose P is hermitian and idempotent. 
Let M be the range of P. For every /in M, it is possible to find a vector 
g in // (perhaps more than one) such that Pg = /. But then Pf = PPg = 
p 2 g _ Pg _ j" Thus, every member of M is unchanged by the application 
of P. Furthermore, if/ = Pfy then f e M (from the very definition of M), 
Thus, M consists precisely of those vectors which satisfy f — Pf If 
and /a each belong to M, then P{ofi 4- = oiPfi 4- ^Pf^ = ofi 4- 

and so a/i 4- ^f% is unchanged by P; thus TV/ is a linear manifold. Finally, 
if fiyffy ... all belong to M and if this sequence converges to /, we have 
Pf^P(f-fn) + P/n - P{f-fn) +/«; hence, ff ^ Pf=^P{fn -/), 
and so l|/„ - p/ll = ||P(/„ -/)|| < ||P|| • \\f„ -/II -0. Therefore 
ll/n - PfW -* 0, and so/„ -► Pf. But/„ ->/, and so f = Pf (since limits 
are unique). Thus, M is closed, and so A/ is a subspace. 

Now, let hsM^. Since P is hermitian, (P/?,PA) = (A, P^/r), and 
since P = P^, (P/z, Ph) = (/i, P/i). But he and Ph g TV/, and so 
{hy Ph) = 0. Therefore, (P/z, Ph) = 0, and so Ph = o. To sum up, 



6. PROJECTIONS i4S 


Pg = g for every vector g in and P/i = o for every vector /t in , 
Then, for any vector/, we can write / = g 4 - where g ^ M and h e Af 
(by the projection theorem), and so // = Pg -f PA == g 4 - 0 « g; that 
is, P/ = P^f/ (where P^^^ denotes projection upon Af). Hence, P = P^^^, 
and so P really is a projection — in fact, it is the projection upon its range. 

Corollary: (a) IfP is a projection so is / - P, and conversely. 

(h) If P is the projection upon A/, then 1 - P is the projection upon M- . 

Proof: If P is a projection, P is hermitian, and so (/ — P)* « 
/♦ — p* = / — p^ so that / — P is hermitian. Furthermore, (/ — Pf = 
l — lP-^P^^l — lP-\-P^I — P^ and so / — P is idempotent. By 
Theorem 2, / — P is a projection. Conversely, if / — P is a projection, 
so is / — (/ — P) = P. This completes the proof of part (a). The proof 
of part (h) is left as Exercise 1 . 

Theorem 3: (a) If P and Q are projections, then PQ is a projection 
iff PQ = QP. 

{b) IfP and Q are projections, then P + (? is a projection iff PQ = O. 

(c) If P and Q are projections, then P — Q is a projection iff PQ ^ Q. 

Proof: {a) If PQ = QP. then PQ is hermitian by Theorem 5-2; 
furthermore, {PQf = PQPQ = PPQQ = P'^Q'^ = PQ. and so PQ is 
idempotent. Theorem 2 now guarantees that PQ is a projection. If 

^ then PQ is not hermitian (by Theorem 5-2), and so it is certainly 
not a projection. 

(A) P 4- 0 is certainly hermitian, and (P 4- Qf = T- QP -F 
P2 4- 02 = ^ g 4 . gP 4 - Thus, P 4” e is a projection iff QP 4- 

PQ = O. If PQ = O. then 0=0*= (P0* = 0*P* == QP. and so 
PQ + QP = O. Thus, the condition PQ = O implies that P 4* 0 is a 
projection. On the other hand, if PQ 4- QP = O. multiplication on the 
left by P (and use of the equality P^ = P) furnishes the equality PQ 4- 
PQP = O; similarly, multiplication on the right by P furnishes the 
equality PQP 4 - = O. Subtraction of the last two equalities furnishes 

the result PQ - QP ^ O, or PQ = QP. Hence O = Pg -f gP = Pg -F 
PQ = 2Pg, or PQ = O. This completes the proof of part (b). 

(c) By the corollary to Theorem 2, P - g is a projection iff / - 
(P -- 0 ^ or (/ ~ P) 4- g, is a projection. Employing part (b) of the 
present theorem, with P replaced by I P. we see that P — g is a 
projection iff (/ — P)g = O, or g = PQ. 

The three parts of Theorem 3 have all been established by “pushing 
around” symbols, but it is very important to visualize the geometrical 
significance of the various conditions that are imposed upon the operators 
P and g. Careful attention to Exercise 2 is therefore strongly advised. 



146 OKRATORS 


£xerc/f0f 

1. Prove part (b) of the corollary to Theorem 2. 

2. Let P and Q be projections associated with subspaces Mp and Mq 

respectively. Keep in mind that Mp and Mq are themselves 
Hilbert spaces. 

(a) Prove that PQ = QP iff the orthogonal complement of Mp n Mq 

with respect to Mp and the orthogonal complement of Mp n 
Mq with respect to Mq are orthogonal to each other. Also, 
show that when this condition is satisfied, PQ is the pro- 
jection associated with the subspace Mp r\ Mq, (Hint: Try 
to visualize the situation when H is euclidean 3-space.) 

(b) Prove that PQ = O iff Mp and Mq are orthogonal; equivalent 

formulations are that Mp ^ {Mq) ^ and that Mq ^ {Mp)^ . 
When this condition is satisfied, show that P Q \s> the pro- 
jection on the direct sum of M p and Mq\ by this expression 
we mean the set of all vectors expressible as the sum of a 
vector belonging to Mp and a vector belonging to Mq, 
(Such an expression is unique, and the direct sum is indeed a 
subspace; prove these statements.) 

(c) Prove that PQ = 0 lhat when this condition 

is satisfied P — 0 is the projection associated with the sub- 
space Mp r\ (Mq) ^ (i.e., the set of all vectors in M p which are 
orthogonal to Mq). 

§7. THE SPECTRUM OF AN OPERATOR 

We shall discuss in this section some of the basic ideas relating to the 
spectrum of an operator on a Banach space B. Recall that the operator T 
is invertible iff it is one-to-one and onto; the boundedness of T ^ need 
not be assumed, since it is guaranteed by the second corollary to Theorem 
3-1. 


Theorem I : If the operators 7\ and are both invertible, then so 
is and 

Proof: Merely observe that is both a left and right inverse 

of JiJs. 


Theorem 2: If the operators and commute and if TiT 2 ^ is 
invertible y then Ti and are both invertible. {Cf Exercise 1 .) 

Proof: Let the inverse of TiTg, whose existence is assumed, be 
denoted S. Then TxT^S = I and STfr 2 == /. The latter equation may be 
rewritten ST^T^ *= /, since and are assumed to commute. Thus, Ty 



7. THE SPECTRUM OF AN OPERATOR 


147 


possesses a left inverse (namely ST 2 ) and a right inverse (namely ToS). 
Hence, ST 2 = T^S = Similarly STi = T^S = 

Theorem 3. //^ IIT’ — /|] < then T ^ exists: equiialendw if 
\\S\\ <\, then {I - Sy^ exists. 

Proof: We work with the second formulation of the theorem. 
The terms of the series / + *5* 4* -f •S’^ + * • • are dominated in norm 
respectively by the terms of the series 1 4 - H^H 4- \\Sr 4* 11^11® 4- • • • , 
The latter series converges (since |1S1| < 1) and so the original series 
converges to an operator which we term S. Then S(I - S) == S - SS ^ 
(I S + ‘ — (S S‘^ S'^ • •) = /, and so 5^ is a 

left inverse of / — S. Similarly, 5 is a right inverse of / - 5. Hence 
(/ — 5)“^ exists and equals S. 

Corollary: If T is invertible there exists a positive number € such 
that all operators f satisfying the condition H f — Tl! < € are also invert- 
ible; in fact, we may choose e = l/||r"^||. 

Proof: Sincerisinvertible,f = 7- (T-- T) = T[l -T f)]. 
If lir- f|| < l/||r-^||,theinequaliiies|ir H7~ f)|l < \\Ty\ • f|l < 

1 hold, and so / ~ T~^(T — T) is invertible. It then follows from 
Theorem 1 that f is also invertible. 

It should be pointed out that Theorem 3 is very closely related to the 
contraction theorem of Chapter 1. The equation (/ ~ ‘5’)/ = ^ rnay be 
rewritten in the form/ — g Sf The (non-linear) mapping/-~>g 4- Sf 
is contracting for any choice ofthe vector g, since ||(g 4- Sf) — (g 4* SfM == 
““ / 2 )il < 11‘S'II • ll/i -fA and 11511 < 1 by hypothesis. Since we 
are dealing with a complete metric space, the contraction theorem 
guarantees that the equation / = g 4- 5/ po.ssesses a unique solution; 
thus / - 5 is indeed invertible. If we use the method of iteration, employ- 
ing the initial guess o, we obtain the following sequence of approxi- 
mations: o, g, g 4 Sg, g 4- 5g 4 S^g, . . . , and so in the limit the 
solution /= (7454 5^ 4 5=^ 4 • ' •)g is obtained, in agreement with 
the formula obtained for (/ — S)~^. 

Definition I : The set of values of the scalar ^ for which (A/ — 7)'"^ 
exists is called the resolvent set of the operator T; the remaining values 
of A constitute the spectrum of 7, denoted a{T). 

Theorem 4: Jf\X\ > ||7)|, the operator A/ - 7 is invertible. 

Proof: A7 - 7 = (A/)(/ - ?r^T). Since ||A-^7|t = ||71|/|A1 < 1, 
Theorem 3 implies that / — A“^7 is invertible, and then Theorem 1 



14a OPERATOnS 


guarantees that XI — T is invertible. (Note carefully that XI -- T may ht 
invertible when \X\ < l|ri|; for example, if J = /, A/ — J is invertible 
when A = i, although |A| < ||r|| = 1.) 

Thus, the spectrum of T is confined to the closed disc {A | |A| < ||r||} 
of the complex plane. Since the corollary to Theorem 3 is obviously 
equivalent to the assertion that the resolvent set of T is open, we obtain 
the following result. 

Theorem 5: g(T) is a compact subset of the closed disc 

UilAKiirii}. 

It is conceivable, of course, that the spectrum may be empty, and, 
in fact, this can actually occur in real Banach spaces. (Cf. Exercise 2.) 
However, in Appendix B we show that the spectrum of an operator on any 
complex Banach space cannot be empty. The proof will be readily 
understood by the reader if he is acquainted with the elements of the 
theory of analytic functions (through Liouville’s theorem). 

Definition 2: If there exists a non-zero vector f and a scalar p. such 
that Tf = pf p is called a characteristic value, or eigenvalue, of T, and f is 
called a characteristic vector, or eigenvector, of J. Since Tf = pf and 
Tf ^ vf obviously imply that p = v {if f 9 ^ o), we may refer to p as the 
eigenvalue of T associated with f 

We conclude this section with the observation that any eigenvalue 
of an operator T belongs to the spectrum of T, for if Tf = pf {f 7 ^ o), 
then {pi ^ T)f ^ {pi — T)o = o, and so pi — T is not one-to-one. 
However, as shown in Exercise 3, the spectrum may contain numbers 
which are not eigenvalues. 

Exercises 

1. Prove that it is not permissible to drop the commutativity con- 

dition in Theorem 2. 

2. Give an example of an operator on a real Banach space possessing 

empty spectrum. 

3. Consider the (complex) Hilbert space L^{[0, 1]), and let Tf = xf 

(More precisely, Tf = g, where ^(a:) = xf{x) almost every- 
where.) Show that {a) T is hermitian, {b) l|r|| = 1, (c) T 
possesses no eigenvalues, and {d) the spectrum of T consists 
precisely of the interval [0, 1]. 



8, HERMITIAN, NORMAL. AND UNITARY OPERATORS 

§8. SPECTRA OF HERMITIAN. NORMAL, 

AND UNITARY OPERATORS 

In this section we present some elementary but important facts 
concerning the spectra of certain types of operators on a Hilbert space. 


Theorem I : The eigenvalues of a hermitian operator T are real; 
If Si S 2 eigenvectors of T corresponding to distinct eigenvalues 
and P 2 , respectively, then (g^, g^) = 0. 

Proof: (a) Let p be any eigenvector of T and let g be an eigenvector 
associated with //. Then (Tg, g) =: (pg^ g) p(g, g), and so /i « 
S)' Since (Tg, g) is real, p is the quotient of two real numbers, 
and hence is real. 

(b) Since T is hermitian, = (gi, Tg^)^ and so (pigi. g 2 ) « 

(^ 1 . Since p^ (like p^) is real, we obtain pdgug^i) == fHigu or 

(/ii - P 2 )(gi. gi) = 0. Since p^ ^ p^, it follows that (^ 1 , ^ 2 ) = 0. 

Theorem 2; The spectrum of a hermitian operator T is confined 
entirely to the real axis. {This result contains the first part of Theorem 1.) 

Proof: (a) If p is not real, then the preceding theorem shows that 
the equation (pi --- T)f o implies that / = 0 . Therelbre, pi — T h 
certainly one-to-one. 

(h) Next we shall prove that the range of pi — T is a dense subset of 
H whenever p is not real. For, in the contrary case, the closure of the 
range of pi — T would be a proper subspace of H. By the projection 
theorem, there would exist a non-zero vector h orthogonal to 5, and hence 
orthogonal to the range of pi — T. Thus, for every vector /, 

(/7, {pi ~ T)f) = 0; 

choosing / = h, v/t obtain (/;, ph) = (/?, Th), and so /i = {h, Th)l{h, h). 
Since Tis hermitian, p, and hence p, would be real, contrary to hypothesis. 

(c) Now let f be any vector. From {b) we know that there exists a 

sequence of vectors fufi.f^y . • • belonging to the range of pi — T and 
converging to /. Therefore, we can find a sequence of vectors ^ 1 , • • • 

such that ipl - T)g^ =/„. The sequence {pi - T)g^ being convergent, 
is Cauchy; therefore, given c > 0, there exists an index N{€) such that 
€ > 11(^7 - T){g^ - gm)\\ whenever m and n both exceed N{€), Now, a 
simple calculation (cf. Exercise 1) shows that 


IK/^/ - T){g„ - - r)(^« - gJW^ 



ISO OPERATORS 


where ft = a, + ifi, a. and ^ real. Since yS 0, we obtain the inequality 
« > l/3| • lign - gmll. or ||^„ - g„|| < Thus, the sequence of g's is 
Cauchy, and since H is complete, the g"s converge to some vector, say h. 
We now observe that Th-f^ (Tg^ -/) + T{h - g^) = (/„ -/) -f 
nh - gj, and so \\Th ~ f\\ < «/„ -/ll + l|r|| • \\h ~ gj. Letting 
n-^ CO, we conclude that Th = /. Therefore, the mapping /// — J is 
onto, and the proof is complete. 

We now turn to the class of normal operators, which possess some 
of the important features of hermitian operators. 

Definition I : An operator N is said to be normal if it commutes 
with its adjoint: NN* == N*N. 

A number of consequences of this definition are presented in 
Exercises 2 to 5. 

Theorem 3: The operator N is normal iff \\Nf\\ = i| A*/|| for every 
vector f 

Proof: Suppose that N is normal. Then, for any vector /, || = 

{Nf, Nf) = {N*Nf,f) = {NNff,f) = {Nff, Nff) = ||A^yp, and so 
llAy'll = 1|A^*/|1. Conversely, if ||A[/'|| = ||A*/I1 for every vector /, we 
obtain ((AA* - N*N)ff) = {NNff,f) ~ {N*Nf,f) = {N% N*f) ~ 
(Nf, Nf) = IIA^yp - IIAT/p = 0. By Theorem 5-1, NN* - N*N = O, 
or NN* = N*N, and so N is normal. 

Theorem 4: If g is an eigenvector of the normal operator N 
associated with the eigenvalue p, then g is also an eigenvector of N* 
associated with the eigenvalue ft. 

Proof: According to the last part of Exercise 3, N -- pi is normal. 
Employing Theorem 3 we obtain 0 = \\{N — pl)g\\ = ||(A — pl)*g\\ = 
II — pl)g\\, and so (A^* — pl)g = o, or N*g = pg. 

Theorem 5: If g^ gz eigenvectors of the normal operator N, 
corresponding to distinct eigenvalues px and p^, respectively, then {gx, gg) = 
0. 


Proof: Employing the result of the preceding theorem, we obtain 
A*i(gi.g2) = (jWigi.ga) = {l^gugd = (gu ^*g2) = (gi. fiigi) = Mgugt)- 
Hence, (jUi - jWjXgirgs) = 0, and so (gugi) = 0. 

Definition 2: An operator T is said to be norm- preserving if 
[|7]r|| s=s liy'll for every vector f {Note that this condition guarantees that 
T is one-to-one.) 



8. HERMITIAN, NORMAL AND UNITARY OPERATORS iSI 

Thearem 6: {a) T is norm- preserving iff T*T^ A 

(6) If T is norm-preserving y it also preserves all inner products — 

{Tf Tg) = (/, g)for every pair of vectors f g. 

Proof: (a) If T*T = /, then, for every vector f ||/||» « (f f) » 
(T*Tff) = {Tf Tf) = WTfW^y and so \\Tf\\ = \\f\\. Conversely, if T is 
norm-preserving, then, for every vector /, ((/ — T*T)f f) = (/, /) — 

(T*Tff) = (/,/) — {Tf Tf) = ||/||2 — IIT/P = 0, and so, by Theorem 
5-1, T*T = L 

(b) If T is norm-preserving, then, for every pair of vectors/, g we 
obtain {Tf Tg) = {T*Tfg) = {Ifg) = 

Definition 3: An operator V is said to be unitary if it is both norm- 
preserving and invertible. {Cf Exercise 6.) 

Theorem 7; U is unitary iff UU* U*U = /. {HencCy a unitary 
operator is necessarily normal.) 

Proof: The proof follows trivially from the preceding theorem and 
definition. Note that U is unitary iff C* is unitary; also, U is unitary iff 
U* = U-K 

Theorem 8; The product of any {finite) number of unitary operators 
is unitary. 

Proof: Left as Exercise 1. 

Theorem 9: The spectrum of a unitary operator U lies entirely on 
the unit circle, {A | |A| = 1}. 

Proof: {a) Since \\U\\ = 1, it follows directly from Theorem 7-4 
that the spectrum of U is confined to the closed unit disc, {A | |A1 < 1}. 

(6) Let |/i| < 1. Then for any non-zero vector /we have \\Uf\\ = 
ll/ll > WpfW, and so {pi - U)f ^ o. Thus, pi - U is one-to-one. 

(c) Assuming, as in {b), that |/u| < 1 , we shall show that the range of 
{pi — U) is dense in H. (Note the resemblance to the proof of Theorem 
2.) If this were not so, there would exist a non-zero vector h such that 
((^/ ij)ff /z) = 0 for every vector/. Choosing for / the vector U*h, we 
would obtain {pU^h, h) ^ {VU*h, h) ^ {h, h), and by the Schwarz 
inequality we would obtain \\h\\^ < \p\ • ||t/*|| * Dividing by ||/z|l^ 
and recalling that 11(/*|| = 1, we would obtain \p\^ 1» contradicting 
our hypothesis. 

{d) Continuing to assume that \p\<i, we can now extend the result 
of (c) to show that the range of /i/ — U consists of all of H; the proof is 
very similar to that of part (c) of Theorem 2, and we leave the details to the 



152 OPERATORS 


reader as Exercise 8. Hence, fil — U is invertible, and so the spectrum 
is confined entirely to the circumference of the unit disc. 

In Appendix H we provide a brief introduction to a particularly 
important unitary operator, the Fourier transform on L\R). 

Exercises 

1. Carry out the calculation which is needed in part (c) of the proof 

of Theorem 2. (Of course, the fact that T is hermitian must 
be exploited.) 

2. Prove that any scalar multiple of a hermitian operator is normal, 

and that any scalar multiple of a normal operator is also 
normal. 

3. Prove that the product of two normal operators is normal iff each 

of them commutes with the adjoint of the other; in fact, if one 
of the commutation relations holds, then so does the other. 
In particular, if A is normal and a is any scalar, then N — a/ is 
also normal. 

4. Prove that every operator T can be expressed, in a unique manner, 

as ^ 4- iB, where A and B are hermitian, and that Jis normal 
iff A and B commute. (For obvious reasons, A and B are 
termed the real part and imaginary part of J, respectively.) 

5. Prove that if N is normal the equality HyV^H = ||iV||^ holds. 

6. Give an example of an operator which is norm-preserving but not 

unitary, 

7. Prove Theorem 8. 

8. Work out the details of part {d) of the proof of Theorem 9. 

9. (a) Let T be any operator whose spectrum does not contain the 

number /. Prove that (T H- iI){T — H)~^ =z (T — -f //), 

so that we may speak unambiguously of the operator 

(r+i7)/(T-//). 

(^) Suppose that T is hermitian, so that its spectrum certainly 
does not contain /. Prove that {T 4- — il) is unitary. 



CHAPTER 7 

OPERATORS ON FINITE- 
DIMENSIONAL SPACES 


Before continuing with the development of the general theory of 
operators, we shall devote the present chapter to the study of some parts 
of the theory of operators on finite-dimensional spaces. The reader is 
surely aware that there exists a close connection, in finite-dimensional 
spaces, between linear transformations and matrices. In fact, this con- 
nection was for a long time the central idea of the whole theory of linear 
algebra, but in recent decades the concept of the transformation has come 
to play the central role, while the matrix now plays a secondary role, of 
significance primarily as a computational tool. 

Throughout this chapter we are dealing with a complex linear space 
y of finite dimension n. In §1 and §2 the concepts of norm and inner 
product play no role. 

We assume that the reader is familiar with the elementary manip- 
ulations of matrices and also that he is acquainted with the elements of 
the theory of determinants. 

§L MATRIX REPRESENTATION OF 
LINEAR TRANSFORMATIONS 

Let r be a linear transformation on V and let the vectors • • • » 

gn form a basis of V, Let Tgi be represented, for each index /, in the form 

n 

T gi 

IS3 



154 OPERATORS ON FINITE-DIMENSIONAL SPACES 

Then for any vector / we may write 

/ = 2 yiSi 

and 

= 2 ViT gt = 2 =2(2 

t«l i,i«l I 

Identifying/ with the column vector 



we see that the preceding equations can be rewritten in the form 



Thus the n-by-n matrix appearing in the preceding equation rep- 
resents the transformation T, However, it is extremely important to 
realize that this matrix (which we shall denote by the symbol Tg) rep- 
resents Jwith reference to a particular basis, namely, that formed by the 
vectors gi,g2, • • • ^gnl ^ different basis is selected, the representative 
matrix will usually be quite different from Tg, (Cf. Exercise 1 .) If instead 
of the basis {gi, ^2, . . . , select the basis {/?i, //g, . . . , /?„} how is the 

representative matrix altered? Clearly, each of the h*s can be represented 
in the form 

( 1 - 1 ) 


and similarly each of the g’s can be expressed in the form 





I. REPRESENTATION OF LINEAR TRANSFORMATIONS ISS 

Substituting from the latter equation into the former, we obtain 

~ 2 

l,m^l 

and so, since the h"s are linearly independent, we conclude that 

1^1 

where the Kronecker delta, equals 1 when k = m and 0 when k ^ m. 

In matrix notation, this last equation may be written hh' = Ap where 
b and b' are the matrices containing the scalars and respectively, 
in the /-th row and y-th column, while /„ is the matrix consisting of ones 
down the main diagonal and zeros elsewhere. Thus, the matrices b and // 
are inverses of each other, and therefore their determinants are reciprocals 
of each other, so that neither determinant vanishes. (Conversely, il the 
g"s constitute a basis then the vectors /?i, 1^, defined by equation 

(l~l) constitute a basis iff the determinant of the matrix h does not 
vanish.) 

Referring back to the beginning of this section, we obtain 
/= 2 Tf = 2 

i.m-l t.j.m. 1 

If we now express / directly in terms of the /fs: 

/ = 2 

and then identify /with the column-vector 



we immediately obtain 



( 1 - 2 ) 



156 OPERATORS ON FINITE-DIMENSIONAL SPACES 


where is the transpose of the matrix From the first equation of (1-2) 

we obtain 



Now, by substituting into the second equation of (1-2), we obtain 





Thus, we have obtained the desired relation between the representative 
matrices Ty and namely: 


T,=P'Ty0r\ (1-3) 

It might be well to emphasize that the matrices Ty and represent 
the same operator, J, with respect to different bases, while the matrices 
P and relate the two different bases under consideration but have 
nothing to do with J. 

From (1-3) we immediately obtain det 7), = Tg) x 

(det 0')-^) = (det r,)(det (^'/?'-»)) = (det 7;)(det /„) = det T,. Thus, 

although the matrices Tg and T^^ are in general quite different, their 
determinants are equal; that is to say, the value of the determinant 
represents an intrinsic property of the transformation F, so that we may 
speak unambiguously of the quantity det T. 

We leave to the reader, as Exercise 2, the proof of the facts that if T^ 

and T 2 are operators and and dg are scalars, then + (^grg)^ = 

^ 1 (^ 1 ). + and (T,T,)y = (T,)g{T,)g; also, if T~^ exists, then 

(T"\ = (Tg)^^, and exists iff det T ^ 0. These results, simple 
though they are, provide the essence of the proof that (once a particular 
basis is selected) there exists an isomorphism between the algebra of 
linear transformations on V and the algebra of n-hy-n matrices. 

Exercises 

1. Show that the zero operator and the identity operator are always 
represented by the same matrices respectively, regardless of 
the choice of basis. 



2. eigenvalues and eicenvectoiis 


IS7 


2. Prove the statements made in the final paragraph. 

3. Let K be the space of polynomials of degree <4 and let 3/ be the 

first derivative of/ for every vector fin V. Set up the matrix 
representing T with respect to the basis {1 , a% .v®, .v^}. 

§2. EIGENVALUES AND EIGENVECTORS 

The definitions of the terms eigenvalue, eigenvector, and spectrum 
do not require the concept of a norm. Hence, we may employ these 
concepts in our present setting. We shall see that, in the finite-dimensional 
case, the spectrum consists exclusively of eigenvalues, and the existence 
of at least one eigenvalue (and hence the non-emptiness of the spectrum) 
will follow from the fundamental theorem of algebra. 

The existence of a non-zero vector / satisfying the relation Tf = /if 
is known, from the elements of linear algebra, to be equivalent to the 
vanishing of del (T - jul). Choosing a basis jifjj, . . . , and .setting 
up the matrix T^, as in the preceding section, we immediately see that the 
eigenvalues of T are the roots of the so-called characteristic equation 
associated with T: 


aji - fA • aj,, 


0^21 ^22 - 




= 0 . 


^ri2 * ’ ^nn 


The left side of this equation is clearly a polynomial (in //) of degree a/, 
with leading term (~-iy>”. By the fundamental theorem of algebra, this 
polynomial, which is called the characteristic polynomial of the matrix Ty, 
possesses exactly n zeroes if multiplicity is counted; thus, there may be 
anywhere between one and n distinct eigenvalues. (Note that this argu- 
ment fails completely in the real field.) The usual case is that the n zeros 
of the characteristic polynomial are all distinct. When this occurs one 
can give a very simple description of the linear transformation T under 
consideration; when multiple zeros occur the situation becomes more 
complicated, and we shall discuss it only briefly. 

Theorem I : Let //z. • • - eigenvalues of T 

(clearly, k n) and let gi, g 2 , • - ^ gk eigenvectors associated re- 
spectively w ith these eigenvalues. Then the g^s are linearly independent. 

Proof: Suppose that the g’s were linearly dependent. Then we 
could select fewer than k of them (say g\,g%y • > ‘ ^gj) which are linearly 



158 OPERATORS ON FINITE-DIMENSIONAL SPACES 

independent and such that each of the remaining g’s is expressible uniquely 
as a combination of the aforementioned g's. In particular, we could 
write 

gk = ai^i + ^ 2^2 4- • • • + ^jgi^ 


It would then follow that 


Tgic == 4 ^2l^2g2 4 • • • 4 y^ifiigr 


Since Tg^ = fijcgu^ the latter equation becomes 


l^kgk == ^il^igi 4 ^2f^‘igi 4 * • • 4 


Multiplying the first equation by and subtracting from the last, we 
obtain 


o = ai(//i - -f a2(//2 - l^k)g2 4 • • • 4 iik)gy 


Since the ^’s appearing in the last equation are linearly independent, all 
the coefficients must vanish, and since, by hypothesis, 0 for 

I < i <y, we conclude that = ag = • • • = = 0, and so g^ = o, 

contradicting the fact that is an eigenvector. 

Corollary: IfT possesses n distinct eigenvalues //j, . . . , and 

if gijgt, . . . ign eigenvectors associated respectively with these eigen- 
values^ they constitute a basis of the space V. 

Proof: According to the theorem, the ^’s constitute a set of n 
linearly independent vectors in an ^i-dimensional space, hence they 
constitute a basis. 

Corollary: If T possesses n distinct eigenvalues, the manifold of 

solutions of each equation Tf = pj^f is one-dimensional; i,e., it is impossible 
to find linearly independent vectors fi,fisuch that Tfi = and Tf^ = 

Proof: From the preceding corollary it is evident that the collection 
of vectors consisting of/i,/ 2 , and all the g\ except g^ would be linearly 
independent; this is impossible, since an w-dimensional space cannot 
contain more than n linearly independent vectors. 

The latter corollary suggests the following definition. 

Definition I : An eigenvalue p is said to be simple, or non-degenerate, 
if the equation Tf ^ pf does not possess two linearly independent solutions. 
If p is not simple, it is said to be degenerate, and the degree of degeneracy 



2. EIGENVALUES AND EIGENVECTORS ISf 

is €([ucil to the fttcixitnun'i nuttiber of li/wor independent eigenvectots ossocioted 
with p- 


Thus, the second corollary asserts that if all the eigenvalues are 
distinct they are simple. As indicated previously, the situation becomes 
more complicated when at least one of the eigenvalues is a multiple root of 
the characteristic polynomial. To begin with a trivial example, we see that 
the characteristic equation associated with the identity transformation I 
assumes the form (1 — pY = 0, and so the only eigenvalue is 1. Since 
every non-zero vector is obviously an eigenvector, the eigenvalue is 
degenerate of degree «, and it is possible, as in the case that n distinct 
eigenvalues exist, to select a basis consisting entirely of eigenvectors. 

This result suggests very strongly that it is always possible to select a 
basis consisting entirely of eigenvectors, whether or not repetitions occur 
among the roots of the characteristic equation. However, this is not so, as 
shown by the following example, in which for definiteness we take n = 4. 
Let T be represented, with respect to some basis A'o’ 
matrix 

( 1 I 0 0\ 

0 I . o\ 

0 0 1 I I 
0 0 0 1 / 


The characteristic equation is obviously (1 — (even though T is 

not the identity). Thus, any eigenvector, when represented as a column- 
vector, must satisfy the equality 



We immediately see that a 2 = ^3 = ~ every eigenvector must 

be a (non-zero) multiple of the vector corresponding to the column-vector 



i.e., gi and its multiples are the only eigenvectors, and so the eigenvectors 
do not span the space. 



160 OPERATORS ON FINITE-DIMENSIONAL SPACES 


We refer the reader to texts on linear algebra for a detailed analysis 
of operators whose characteristic equation possesses repeated roots. 

Returning to the case that the eigenvectors are known to span the 
space, let us choose a basis {giyg 2 , • ■ • ygn) of eigenvectors and let us 
writedown in the same order their associated eigenvalues, // 2 » • • • > 

(Remember that repetitions may occur among the ju's.) Then it is quite 
obvious that the matrix Tg representing T with respect to this basis must 
assume the form 



Conversely, it is evident that if T can be represented, with respect to some 
basis, by a diagonal matrix, then the basis must consist of eigenvectors and 
the diagonal entries of the matrix must coincide with the eigenvalues, each 
eigenvalue appearing a number of times equal to its degree of degeneracy. 

This result can be restated as follows, if we take account of the work 
in §1 dealing with the effect of a change of basis on the representative 
matrix: The eigenvectors of T span V iff the matrix Tf, for arbitrary 
choice of a basis has the property that there exists an 

(invertible) matrix C such that C~^TfC is a diagonal matrix; we say that 
Tf is diagonahle and that C diagonalizes Tf, 

We now return to the characteristic equation associated with T. 
When the left side of this equation is worked out, we obtain (cf. Exercise I ) 


- (a„ + + • • • + -f • • • + (-1)^' det T,} = 0. 


Since the roots of this equation are the eigenvalues of T and since the 
leading coefficient, namely (— 1)"S is independent of the choice of basis, it 
follows that all the coefficients must be independent of the choice of 
basis. In particular, the sum of the diagonal elements, 4- + • * * -h 

a„„, which is termed the trace of the matrix Tg, must be independent of the 
choice of basis, and by observing the constant term of the left side we 
find once again that the determinant of the representative matrix depends 
only on T, not on the choice of the basis. Furthermore, we easily see that 
the trace must equal the sum of the eigenvalues, while the quantity det Tg 
must equal the product of the eigenvalues. 

As the last topics in this section we shall present the Hamilton- 
Cayley theorem and an application of this theorem to nilpotent operators. 



J. EIGENVALUES AND EIGENVECTORS 


l«l 


Given any /;-by-« matrix 



i 

^21 ^^22 * ' 

‘ a 2 « \ 

-1 

\*«1 «„2 • 

.. : / 


we may obviously speak of the characteristic polynomial associated with 
A and of the eigenvalues of A, even though no linear space Kand linear 
transformation on V are under consideration. 


Theorem 2: (Hamilton-Cayley): Lei the variable ft in the 
characteristic polynomial of the matrix A be replaced by A; the resulting 
matrix reduces to the n-by-n zero matrix 0„. (This theorem is often stated 
as follows: Every [square] matrix satisfies its own characteristic equation.) 

Proof: First suppose that the eigenvalues /ii, A are 

all distinct. Choose any n-dimensional space V and any basis 

• • • - gn) 

of y, and let 7 be that operator on V whose representative matrix, with re- 
spect to the basis {gu g2^ • • • •. g„}’> Choose eigenvectors f.„ 

associated, respectively, with the eigenvalues fix, [ii , . . . , Since the 
characteristic polynomial, can be written in factored form as follows: 

clp) = (fli - fl)(fl.i - ft) - ■ ■ (ft,, - ft), 
we readily obtain 


c{A) = {fill — A)(fiJ — A)' ' ' (fij ~ A). 

Since (juJ - A)/„ = o, we see that c(A)f„ = o. Since the order of the 
factors may be rearranged at pleasure, we obtain the more general result 
that c(A)f^ = o, 1 < A: < / 7 . By linearity, c(A) annihilates every linear 
combination of the vectors /i,/ 2 . • • • »/«• Since the/'s constitute a basis, 
c(A) must annihilate every vector in K Thus, c(A) must represent the 
zero operator (with respect to the original basis {gi,g 2 ^ • • • » gri})^ ^tnd so 
c(A) must be the zero matrix, 

If the characteristic equation of A possesses repeated roots, we employ 
a continuity argument. It is not difficult to show that, given € > 0, there 
exists a matrix A whose eigenvalues are distinct and whose elements 
differ from the corresponding elements of A by less than c in absolute 



m OPERATORS ON FINITE-DIMENSIONAL SPACES 


value. Thus, ci^) = where c(jli) is the characteristic polynomial 
associated with Choosing a sequence of c’s converging to zero, we 
obtain the desired result, namely ciA) = 

The Hamilton-Cayley theorem enables us to compute easily once 
the matrices A, A^y . . . , are known. Then is evidently express- 
ible as a combination o{ Ay A^y . , , y and A^; replacing /4” by the expres- 
sion already obtained, we conclude that ^4”+^ (and all higher powers of A) 
can be expressed as linear combinations of the matrices 1^, Ay A^y , . . y 
A^-\ 

Recalling the definition of a nilpotent operator given in Chapter 6, 
we define a nilpotent matrix as a (non-zero) matrix satisfying the equation 
A^ = for some positive integer k (where, of course, A is n-by-n). 
Let /i denote any eigenvalue of a nilpotent matrix ^4, and, thinking of A as 
representing an operator T on some space F, let /be an eigenvector of T 
corresponding to p. Then Tf = pf, T^f = pTf = p^f, . . . , T^f =p% 
Since is represented by the matrix A^y it follows that must be the 
zero operator. Hence, p^ = o, and since /# o, it follows that p^y and 
hence py must vanish. 

Now, conversely, suppose that A has no eigenvalues except zero. 
Then the characteristic polynomial of A must be {—\y^p^\ and the 
Hamilton-Cayley theorem now assures us that A^^ = O. We have thus 
obtained the following interesting characterization of nilpotent matrices. 

Theorem 3: The non-zero n-by-n matrix A is nilpotent iff all its 
eigenvalues are zero. Furthermore y it is not necessary to go beyond y4” in the 
sequence A^y A^, . , . before reaching the matrix {Cf. Exercise 3.) 

Exercises 

1. Work out explicit expressions for all the coefficients appearing in 

the characteristic equation. 

2. By employing the formulas developed in §1, demonstrate the 

invariance of the trace under change of basis, without any 
consideration of eigenvalues. 

3. Construct a nilpotent n-by-n matrix A such that A^~^ ^ 

§3. FINITE-DIMENSIONAL INNER-PRODUCT SPACES 

We now turn to the case that the space V is provided with an inner 
product. We shall see that a very complete and simple description of 
hermitian operators can be worked out and that a substantial portion of 
the theory can be carried over to normal operators. Furthermore, in the 
case of hermitian operators the existence of eigenvalues and the fact that 
the eigenvectors span V will be demonstrated without resorting to either 



3. FINITE-DIMENSIONAL INNER-PRODUCT SPACES 143 


the fundamental theorem of algebra or to the representation of operators 
by matrices. We shall frequently employ, without specific reference 
results obtained in the previous chapter. It must be kept in mind that the 
arguments employed here are. by their nature, confined to finite-dimen- 
sional spaces. 

Theorem I : If T is hermitian, then at least one of the numbers 
II T \\ . — II r|| is an eigenvalue of T. 

Proof; Since ||r|| = sup||,,|,., 1(7/. /)|. it follows that either lim = 
suPil/li=x or ~ ll^ll = 'rif||,j|.., {Tf,f). It suffices to consider the 

former case, for otherwise we may simply replace Tby - T. Introducing 
the symbol for Hril, we then have/x = ||r|| = sup,/,, ., {Tff). Since V 
is locally compact, the unit vectors form a compact subset of K. Since 
(Tff) depends continuously on /. it follows that there exists a unit 
vector g such that (Tg, = /j. We then obtain 

= (Tg - ng, Tg - ng) = (Tg, Tg) - 2fi(Tg,g) ■+• fiHg.g) 
= ||7’;?||^ - 2//.^ + ||7’|i“ - = 0. 

Hence, \\Tg — ^g\\ = 0, and so Tg = ^ig. 

Thus, not only have we shown that HTH (or — ||.r||) is an eigenvalue 
of T; we have also shown how, in principle, the norm of T and an 
eigenvector associated with ± || J|| can be found, namely by seeking a 
unit vector which maximizes |(7/,/)|. Conversely, if Tg = ±/^g, then 
(Tg,g) = ±/i (assuming ||^|| = 1). We shall later exhibit some specific 
computations relating to this remarkable characterization of the largest 
(in absolute value) eigenvalue. 

Having found the eigenvalue Ai (= ±//) and a corresponding unit 
eigenvector, which we denote ^i, let us now consider the behavior of T 
when it is restricted to operating on vectors orthogonal to^j. If (^,, h) = 0, 
then 0 = Ai(gi, h) = (k^g^, h) = (7^,, h) = (^,, Th), Thus, if we denote 
the subspace consisting of all vectors orthogonal to^i as Si, we see that T 
may be considered as an operator mapping Si into *S\; clearly, the 
restriction of Tto Sj is still hermitian, and so we may repeat the preceding 
reasoning (if we exclude the trivial case that V is one-dimensional), 
obtaining a (unit) eigenvector g 2 associated with an eigenvalue where 
lAal = sup |( 27 ,/)|, the supremum being taken over all unit vectors 
orthogonal to gi. By the very nature of the procedure employed, it is clear 
that IA 2 I < |Ai|. If rt > 2, we may repeat this procedure so long as there 
exists a non-zero vector orthogonal to each of the eigenvectors which have 
already been obtained. Thus the procedure which we have described 
automatically comes to a halt when a set of n eigenvectors • * > Sn) 

and a corresponding set of n eigenvalues {Aj, A 2 , . . . , have been 



164 OPERATORS ON FINITE-DIHENSIONAL SPACES 


obtained, the eigenvalues satisfying the chain of inequalities [AJ > \L\ > 

* • • > |A„1. Furthermore, since the n eigenvectors are orthonormal, they 
are linearly independent and, hence, form an orthonormal basis of V. 

It is not difficult to see that if at each stage we maximize {Tf,f) 
instead of \{Tf,f)\ we shall obtain the same set of eigenvalues, but in order 
of diminishing algebraic (rather than absolute) value; similarly, by 
minimizing (Tf,f) at each stage we would obtain the eigenvalues in 
ascending algebraic order. 

Having obtained the eigenvalues A^, A 2 , . . . , A,^ and the eigenvectors 
gi,g 2 i • • • ^gn^ matrix Tg representing T with respect to 

the basis {gug 2 , • • • . is given by 



In contrast to equation (2-1), the diagonal elements of (3-1) are certainly 
real, and it is not necessary to assume in the present case that the eigen- 
vectors of T span the space V; the preceding argument shows that this 
condition is certainly satisfied. 

Now suppose that T is hermitian and that we represent T with respect 
to an arbitrary orthonormal basis {//j, //o. . . . , /?„} by the matrix 



\^«l Vra • ' * rjnn/ 

where the elements of this matrix are determined by the equations (cf. 

§ 1 ) 

Taking account of the orthonormality of the //’s, we obtain the chain of 
equalities = (X-.i //*) = niAk = >?».• 

Similarly, rja = hi) = (*t. ’^hi) = (JA,, h^, and so = %.• Thus, 
the matrix Tf^ must have the property that each entry is the conjugate of its 
image in the main diagonal. Conversely, if the vectors {hi, /72, . , . , hj 
form an orthonormal basis and if possesses the preceding property, it is 
evident that T must be hermitian. For this reason, a matrix possessing 



3. FINITE-DIMENSIONAL INNER-PRODUCT SPACES l« 


the aforementioned property is termed hermitian, even if no inner-product 
space is under consideration. It should be emphasized that if the operator 

T is hermitian but the chosen basis fu //,J is not orthonormaU 

the matrix J,, will in general not be hermitian. 

An obvious, but interesting and important, consequence of the fore- 
going analysis of hermitian operators is the following result. 

Theorem 2: A hermitian operator (on a finite-dimensional inner- 
product space) is positive iff all its eigenvalues are non-negative. 


This characterization of the eigenvalues and eigenvectors as the 
solutions to certain maximization problems possesses one unsatisfactory 
feature. We refer to the fact that the eigenvalues A3, . . . and the 
corresponding eigenvectors are defined in terms of the preceding eigen- 
vectors; for example, ^3 is determined as a unit vector which provides the 
solution to a certain maximization problem among all unit vectors 
orthogonal to and g.>. (Incidentally, these remarks apply whether the 
eigenvalues are indexed in order of descending algebraic value or of 
descending absolute value.) We now indicate briefly how the eigenvalues 
and eigenvectors can be characterized directly, without reference to eigen- 
vectors previously obtained. For definiteness we suppose that the eigen- 
values are to be indexed according to descending algebraic value; the 
modifications to be made when the eigenvalues are indexed otherwise 
(descending absolute value, ascending algebraic value, ascending absolute 


value) will be quite obvious. ^ 

We begin by repeating the characterizations of and g^: - 

max,w„ . (r/',/')and?,ischosenasanyiinitvcctorsuchthat(rif„g,) - A,. 

Now let h be any vector and let A(/i) denote the value of max (7/, /) when 
/is subjected to the condition (/. //) = 0 in addition to the usual condition 
Iiril = l. If h = 0, then clearly A(/i) = (since the orthogona i y 
condition provides no restriction on /in this trivial case). Setting asi c 
this trivial case, we may assume that I, is a unit vector ('or otherwise A 
may be replaced by (l/||/i||)/i) and we may then write /t = ^ 

I r2 , |«|2 =: 1 and e is some unit vector orthogonal to g,. The condition 
1/ /,! 1 0 now assumes the form .(/ g.) + Hif, - 0. f we_denote 
(/gi) and (/,g) by y, and y, respectively, the conditions ||y|| - and 

(/, h) —0 assume the forms 


|yir + l7P = '- ayi + /^7-C’ 


(3-3) 


respectively. We also note that/= y.gi + yg. where |yt = lyl and g, 
like g, is a unit vector orthogonal to gj. Hence, 

(T/J) = Inr (rg.,g,) + lyp (Tg,g) + y,?(7’g.*l) + yyif^l-gi)- 0^^ 



IM OPERATORS ON FINITE-DIMENSIONAL SPACES 


The last two terms on the right side vanish, for {Tg^, g) = Ai(gi, g) = 0 
and {rg,gi) = (I, Tgi) « = 0. Recalling that (g,g,) = 0, we 

now see immediately that, for fixed /i and y, the maximum of (7/, /) is 
equal to -f Aj |fP, or |yiP -f Ag ly|^, and is achieved by choosing 

for g the eigenvector g 2 . Taking account of the first part of {3-3), we 
obtain Ai |yi|2 + Ag — Ag) lyi|^ -f- Ag; since Ai — Ag is non- 

negative, the right side of the last equality is minimized by choosing for 
yi the value zero, and hence imposing on y the condition \y\ =1. The 
second part of (3~3) now imposes on ^ the value zero, and this in turn 
imposes on a the condition |a| = 1. 

Thus, we have shown the following: The quantity maX||,||^j 
(Tfyf) is minimized by choosing for h the eigenvector gi (or any non-zero 
multiple of^i), and with this choice of h the aforementioned maximum is 
Ag. In other words, Ag is the smallest possible value of maX|j^„_j (Tf,f) 
which can be obtained by imposing a restriction of the form (/, h) = 0. 
By a rather straightforward extension of the preceding argument we can 
show that A3 is the smallest possible value of maXjj^u^^ (Tf, f) which can be 
obtained by imposing on f two restrictions of the form ( /, h^) = (/, h^) = 
0. By repeating this argument, we obtain the following direct characteriza- 
tion of each of the eigenvalues A,, A2, . . . , A„. 

Theorem 3 (Mini max Theorem): The eigenvalue A^. (1 < k < n) 

of the hermitian operator T is equal to the minimum value of the quantity 
niax ( Tf, f) which can be achieved by imposing on the vector f restrictions 

of the form (f h^) = (/. h^) = •••=(/, = 0. 


Although the determination of eigenvalues (of a linear transformation 
on a finite-dimensional space) is in principle a problem of elementary 
algebra (the solution of a polynomial equation), a huge amount of 
literature exists on the problem of obtaining accurate approximations for 
the eigenvalues (and eigenvectors). In the remainder of this section and 
in the following section we shall discuss briefly two methods, both of 
which are restricted to hermitian operators. 

Let the hermitian operator T be represented, as before, with respect 
to some orthonormal basis {//i, //g, . . - , h^ by the matrix (3~2). As we 
have seen, this matrix must be hermitian. It is quite clear that we may 
consider K, the space on which T operates, to be l\ (with the usual 
definition of inner product). If we write the typical vector ^ of F in the 
form 



3. FINITE-DIMENSIONAL INNER-PRODUCT SPACES 


l«7 

it is readily seen that 

iTg,g)^ 2nuYi)'i- 

i.i-l 

Thus, if we enumerate the eigenvalues in descending algebraic order, we 
obtain 

A, = max 2 narifi^ K = min ^ 

1 ij I 

the max and min being taken subject to the constraint 


71 




In particular, if we choose y*. = for some index w, we obtain A„ < 
Vmm < Since these inequalities hold for every choice of m, we obtain 

< min < max j/„„„ < A,. 

1 m ■ . /< 1 ■ ' r/ 

A more ambitious procedure consists in choosing a large number of 
unit vectors, spread as nearly densely as possible (this phrase, of course, 
is very imprecise) over the set {^ | Hag'll = 1} and evaluating (Tg,g) for 
each of these vectors. We illustrate this idea in the case of an operator 
T having, with respect to some orthonormal basis, the representative 
matrix 



Let us choose the following unit vectors: 




gH == 


gn = 



(Note that we are confining ourselves to trial vectors with real components, 
the justification of this step is left as Exercise I . Also, note that if is any 
one of the trial vectors, the vector —gk is not employed, for = 

(T(^g^), and so no additional information would be obtained.) 

The numbers (Tgj^.gjc) vary between —1.52 and 4.52. Thus, A 2 < —1.52 
and Ai > 4.52. 



m OPERATORS ON FINITE-DIMENSIONAL SPACES 

Of course, in this particular case the explicit determination of the 
eigenvalues is completely trivial. We easily obtain 

and so 

This computation provides a rather crude illustration of the Rayleigh- 
Ritz procedure, which will be considered in a little more detail when we 
deal in the next chapter with infinite-dimensional spaces, particularly in 
connection with the estimation of eigenvalues of symmetric kernels. 

Exercises 

1 . Let r be a hermitian operator on a complex A7-dimensional vector 

space, and suppose that, with respect to some orthonormal 
basis {//j, /72, 113 , . . . , hj, the elements of the representative 
matrix are all real. Show that it is possible to find a basis 
of eigenvectors whose components, with respect to the basis 
{//j, /72, . . . , are all real. 

2. Suppose that, in addition to the assumptions made in Exercise I, 

the elements of are all strictly positive. Prove that ||r|| is an 
eigenvalue, that it is non-degenerate, that associated with this 
eigenvalue there exists an eigenvector whose components with 
respect to the basis {hi, h^, . . . , /?„} are all positive, and that 
— ||r|| is not an eigenvalue of T. 

3. Let A be any hermitian matrix, with eigenvalues Aj, Ag, . . . , 

(in descending algebraic order). Prove that if any one of the 
diagonal elements (which are, of course, all real) is increased 
while all other elements are left fixed, then the new set of 
eigenvalues Ii, I 2 , . . . , A„ (also in descending algebraic order) 
are such that each of the n inequalities must hold, with 

strict inequality for at least one value of k. 


det(r - A/) = A“ - 3A - 7, 




§4. KELLOGG’S METHOD OF ESTIMATING 
THE LARGEST EIGENVALUE 

For ease in exposition we first suppose that the hermitian operator 
T is known to be positive and that the largest eigenvalue is known to be 
non-degenerate. However, it will then be shown that with minor modifica- 
tions the two restrictions just imposed may be dropped. However, the 
condition that T should be hermitian cannot be omitted. 



4. ESTIMATING THE LARGEST EIGENVALUE Itf 

From the work of the preceding section we know that the eigenvalues 
of T are all non-negative and that, if they are denoted in descending 
algebraic order (which, under our hypothesis that T is positive, also 
implies descending absolute value) by >1,, . . . , with corresponding 
unit eigenvectors . . . , then for any vector / the equalities 


f Tf = '^ 

** 1 A- I 

hold. In fact, more generally, for any positive integer m, we obtain the 
equality 

TJ = lKUiik)u 


Assuming that 9 ^ 0, we obtain 


= i V !(/, = Aj" \{f, g,)r^(i + i (M”' 

*■= > \ i -AA,/ 1(/, 

Replacing m by m + 1 and dividing, we obtain 

iT%f) 1 + * • • ’ 


where the quantities denoted by • • * approach zero with increasing m 
(since the fractions 4/A, are each strictly less than unity). 

Hence, we have shown that A,, the largest eigenvalue, is given by the 
limiting relation 




(T-f.f) 


where /is any vector which is not orthogonal to g,. (Of course, since g, is 
itself unknown, there is a possibility of making an unfortunate choice for 
/, but in some cases [cf. Exercise I] it is possible to guard against this 
occurrence.) Obviously, the more closely the direction of/ agrees with 
that of gi, the faster is the convergence of the sequence of fractions 

...... 

Now we shall show that the procedure just described furnishes the 
eigenvector gi as well as the eigenvalue A,. Clearly, 

/ n \l/2 


117-/11 - 


= Ar !(/ gi)i {1 4 - ' • • 



170 OPERATORS ON FINITE-DIMENSIONAL SPACES 


where, as before, • • • denotes an expression which approaches zero with 
increasing m. Hence, 


^ 4 .... 

I|7’’'‘/ll l(/.^i)l ■ 

Since (f,gi)l\(f,g])\ is a constant of unit modulus, we may assume that 
it is equal to unity, for this merely amounts to replacing gi by another 
unit eigenvector associated with Thus, we obtain the following 
remarkable formula for the unit eigenvector (or, more precisely, a unit 
eigenvector) associated with : 


= 


I|T'"/|| 


We now comment briefly on the possibility of extending the scope of 
the Kellogg procedure. First, it is not difficult to see, by a trivial change 
in the reasoning, that the method is effective even if the largest eigenvalue 
Xi is degenerate; all that is necessary is that the trial vector / should not 
be orthogonal to the manifold spanned by the eigenvectors associated with 

Secondly, even if T is not positive, is positive, with the same 
eigenvectors as T and with eigenvalues which are the squares of those of T. 
If has a non-degenerate largest eigenvalue, then it is evident that the 
largest or smallest eigenvalue of T — the largest in absolute value — is also 
non-degenerate, and the procedure described previously leads to this 
eigenvalue and to the corresponding eigenvector. However, complications 
may arise if the largest eigenvalue of is degenerate, in particular if the 
largest and smallest eigenvalues of T are the negatives of each other. 
However, in Exercise 2 we indicate how this complication may be avoided. 

Returning to the case that T is known to be positive, let us suppose 
that / is any vector such that Tf ^ o. Then, by the generalized Schwarz 
inequality, which, we recall, asserts that \(Tf, g)|2 < (7/, f){Tg, g) for any 
positive operator T and any vectors / and g, it follows that {Tf, f) > 0. 
(Cf. Exercise 5-6 of Chapter 6.) Since (7% /) = (Tf, Tf) > 0, it follows 
that ry = T{Tf) ^ o, and so, by repeating the preceding reasoning, we 
conclude that (TiTf), Tf) = (Tf, Tf) = (jy,/) > 0. Repeating this 
argument indefinitely, we conclude that (ry, /) > 0 for /: = 0, 1 , 2, . . . . 
Now, in the generalized Schwarz inequality replace / by and g 

by for m > 2. (Of course, when m is odd, r^”»-2>/2 and are 

understood to signify j’Cw»~-3)/2ji/2 j{m~i)/2ji/2^ respectively, where 

T^^^ is the positive square root of J, whose existence and uniqueness 
have been demonstrated in the previous chapter.) We then obtain 
|(7X«m--2)/2y; J’t»/2y‘)|2 < (j7^(m-2)/2y; T”^l 2 fy Taking 



!. SPECTIWL MPRESENTATION OF HE.MITI.N OFEMTO.S 1,1 

account of the hermitian character of at] the operators involved, we obtain 

f)(T"'*'f f) 

or ■ ■ ’ 

^ (r*" '■/,/) 

and so the ratios which appear in the Kellogg pro- 

cedure must increase with m to the largest eigenvalue. Of course this 
result does not hold in general if ris not positive. 

Exercises 

1. Let Tbc represented, with respect to a certain orthonormal basis 
{hi, /zg, hs), by the matrix 


Show that the vector /=//,+ ^ may be safely used as 
a trial vector in the Kellogg procedure. With this choice of/, 
carry out several stages of the Kellogg procedure for deter- 
mining both ki and gi. (Hint: Refer to Exercise 3-2.) 

2. Let T be hermitian with eigenvalues Aj, A 2 , ...» Show that if 
c is any real number, the operator T + cl is also hermitian, 
with the same eigenvectors as T and with eigenvalues Aj 4 - c, 
Ao -h c, . . . , A„ -E c. Show how this fact may be used to obtain 
a monotone sequence of approximations for Aj (and also for 

KY 


§5. SPECTRAL REPRESENTATION OF 
HERMITIAN OPERATORS 

We shall present in this section the so-called spectral theorem for a 
hermitian operator. It is, in fact, perfectly trivial (in the finite-dimensional 
case, to which we are restricting attention in the present chapter), but it 
turns out to have a natural generalization which constitutes the major 
theorem concerning hermitian operators on infinite-dimensional Hilbert 
spaces. We shall allude briefly to this generalization in the following 
chapter. 

For convenience we now reverse our previous indexing of the eigen- 
values of the hermitian operator J; we now enumerate them so that 
Aj < Aa < • • • < A„. As before, we let the unit vector denote an 



in OPERATORS ON FINITE-DIMENSIONAL SPACES 

eigenvector associated with (Of course, if any eigenvalue is degenerate 
the collection of eigenvectors associated with this eigenvalue is understood 
to be orthonormal. The reader may find it helpful on first reading to 
confine attention to the case that the eigenvalues are all simple and then 
observe the minor changes in reasoning needed to cover the general case.) 
For each real number A, let each of the eigenvalues {Ai, Ag, . . . , A;^} be 
<A while the remaining eigenvalues exceed A. (Of course, k = fc(A); if 
A < Aj, the set {Aj, Ag, . . . , AJ is empty, while if the equality 

k = n holds.) Let denote the subspace spanned by the vectors gi, 

* Sk (so that S;, ^ if X ^ /^), and let denote the projection 
operator with range S^. 

We make the following observations about the family (a) 

= O if A < Aj; (/)) = / if A > A„; (c) PxPfi = P^P a == P min{A./«}’ 

{d) if A < fi, P^ — P;t is a projection; (e) if A is not an eigenvalue of T, then 
P^ remains constant when // varies over a sufficiently small interval 
containing A; and (/) if A is an eigenvalue of P, then, for all sufficiently 
small positive e, P^^ ^ = P^, but P^_g PaJ io fact, P^^^ — Px~c is the 
projection upon the subspace spanned by the ^'s associated with the 
eigenvalue A. 

Roughly speaking, as A traverses R from — oo to +oo the projection 
Pa grows from O to /, the growth occurring at the eigenvalues of T. 

The family {Pa} (i.e., the function sending A into Pa) is called the 
resolution of the identity associated with P. 

Examples 

(a) If P = O, Pa = O for a < 0, Pa = / for a > 0. 

(d) If P = /, Pa = O for A < 1, Pa = / for A > 1. 

(c) If P is a projection other than O and /, then P^ = O for A < 0, 
Pa=/ - Pfor0< A< l,PA=/forA> 1. 

Now we show how the operator P can be expressed, or synthesized, 
in terms of the corresponding resolution of the identity. For simplicity of 
exposition, let us assume that the eigenvalues are all non-degenerate, so 
that Aj < Aa < • • * < A,j. Since the eigenvectors^!, ^ 2 ? • • » ortho- 
normal and span the space, we know that for every vector / the following 
equalities must hold: 

f=i(f,gk)gk, (5-1) 

r/' = i(/.g*)rg, = i >(,(/, g,)g,. (5-2) 

fc==l jfc==l 

Now, let us choose numbers //q, //i, . . . , such that < Xi < jui < 
A 2 < • • • < < A,, < ju^. Then clearly, for 1 < /r < w, P^^ - P^^ ^ 

is the projection on the one-dimensional manifold spanned by gj^y and 



s. SPECTRAL REPRESENTATION OF HERMITIAN OPERATORS 171 


Hence, (5-2) may be rewritten as follows: 


= (5-3) 

Jt-1 

Thus, dropping f from both sides, we obtain the following remarkable 
representation of T\ 

T=1UP,.-P, (5-4) 

;fc-i 

It is helpful to replace ^ by the symbol meaning the 

grow th of P^ as K crosses the number Xj, (from left to right). Then we 
obtain 

T=ix,dP{i,). (5-5) 

This is known as the spectral representation of T; we have proven that 
every hermitian operator on a finite-dimensional Hilbert space possesses a 
spectral representation, involving the eigenvalues ot the operator and the 
resolution of the identity associated with the operator. This result is 
known as the spectral theorem (for finite-dimensional spaces). (T he reader 
should have no difficulty in seeing that (5-5) remains correct even if T 
possesses one or more degenerate eigenvalues.) 

The right side of (5-5) will (hopefully) suggest to the reader that an 
integration process of some sort is lurking in the background. If the 
reader has studied the (Riemann-)Stieltjes integral he should have no 
difficulty in seeing that the equality 


{Tf, g) = i 4(<^P(4)/. g)- 

A --1 

which follows immediately from (5-5), can also be written in the form 

(T/,g) = [”°AW.g)- ^5-7) 

J— 00 

(For (he benoB. of the teader who ha, no. 

integral, we provide a very brief account m to 

the inner-product symbol in (5-7) and the vectors /and g, we are 

rewrite (5-5) in the form 

/*00 

T= XdP,. 

J— 00 


(5-8) 



174 OraRATORS ON FINITE-DIMENSIONAL SPACES 


Exercises 

1. Let the operator T possess, with respect to a certain orthonormal 
basis {^ 1 , /i 2 , the representative matrix 


M 1 r 

r. = I 1 1 1 


.1 1 h 


Work out the resolution of the identity associated with T. 

2. Let T be any hermitian operator and let p denote any polynomial 
in one variable (complex coefficients permitted). Show that, 
in analogy with (5-7), the equality 

(KT’)/>g) = f"pW diPJ.g) 

J—ao 


holds, and that p{T) is hermitian iff p(X) is real when A is an 
eigenvalue of T. (In particular, if p possesses real coefficients, 
this condition is satisfied.) 

3. As in Exercise 2, let J be hermitian, and let p and q be two 
polynomials. Show that p(T) = q(T) iff p{X) = q{X) whenever 
A is an eigenvalue of T, 


§6. SPECTRAL REPRESENTATION OF 
NORMAL OPERATORS 

In the previous chapter we have seen that normal operators share 
with hermitian operators the property that eigenvectors associated with 
distinct eigenvalues are orthogonal. Thus, if the normal operator N 
defined on a finite-dimensional space has no degenerate eigenvalues, it is 
easily seen that the eigenvectors form an orthogonal basis, and it is not 
difficult to prove that even when degeneracy occurs it is still possible to 
choose an orthonormal basis consisting entirely of eigenvectors, (Cf. 
Exercise L) 

Recalling that every normal operator N can be expressed in the form 
A -f iB, where A and B are hermitian operators which commute with 
each other, let the eigenvectors gu g 2 , ^ y gn of N form an orthonormal 

basis, with corresponding eigenvalues A^, Ag, . . . , A^. Since the A^^’s are, 
in general, non-real, we write 


Ajb = ajt 4“ ifik* and real. 



6. SPECTRAL REPRESENTATION OF NORMAL OPERATORS 17 $ 

Then we know that N*g, = and it is then readily seen that & is an 
eigenvector of both A and B: 


= ^kgk. Bg, = p,g,. 

Thus, taking account of the results of the previous section we see that wc 
may associate with A and 5, respectively, resolutions of the identity pJ^ 
and Qx : 



It should be quite obvious, from the fact that and Qx are projections 
on subspaces spanned by none, some, or all of the vectors . . . , 
that the equality 

iP,, - P,,m, - Q.) = {Q., ~ Q.){P,, - p,) 

must hold whenever > //2 and and so the operator defined by 

each side of the preceding equation is itself a projection. For any pair 
of vectors /, g it is now easy, although perhaps a little tedious, to justify 
the following analogues of (5-7) and (5-8): 


m g) = f " r(p + g) gi 

J-aoJ-ijc 

Af = f f (^ + iv) dP^ dQ,. 

J — OC-J'-OC 

It is customary to set // + iv equal to the complex variable z and to define 
the projection as the product P^Q,, (= QvPn)' preceding formulas 
are then rewritten as follows: 


(N/, g) 


N 



z cl{EJ. g), 


oomplfx 

piano 


_ JJ zdE,. 

oojnplox 

l»!iino 


Of course, is termed the resolution of the identity associated with N. 

Exercises 

1. Show that an operator N is normal iff it can be represented, with 
respect to some orthonormal basis, by a diagonal matrix. 



,76 OPERATORS ON P.NITE-OIMENSIONAL SPACES 


2 . 

3. 


«!how that an operator U is unitary iff it can be represented, with 
^ resLt to some orthonormal basis, by a diagonal matrix 
Sse diagonal entries are all of absolute value one. 

Prove that every normal operator can be expressed in a unique 
manner as the product of a positive operator and a unitary 
operator and that the latter two operators commute. 


4 (a) Let T be hermitian, with resolution of the identity and let 
M be any subset of the real number system Give a reason- 
able interpretation of f XdP,. 

{b) Similarly, let N be normal, with resolution of the identity 
and let M be any subset of the complex number system 
Give a reasonable interpretation of f f ij; z dE^. 


5. Let f/ be a unitary operator. Define, on the interval [0, 277-], an 
increasing family of projections {Pf,} (i-e., ^ Po^ i( 0^ ^ 

such that Po = O, ^ 2 - = ^ reasonable inter- 

pretation of the integral. 


r2n 

U =J e’" clPg. 



CHAPTEK 8 


ELEMENTS OF 
SPECTRAL THEORY IN 
INFINITE-DIMENSIONAL 
HILBERT SPACES 


The generalization to infinite-dimensional spaces of the theory 
presented in the previous chapter is in a very incomplete state of develop- 
ment. In this chapter we shall give a brief presentation of some of the 
most important ideas that have emerged during the development of the 
theory until the present. Most of our exposition will refer to Hilbert 
spaces, for which the present state of the theory, incomplete though it may 
be, is much more advanced than for Banach spaces. 


§1. COMPLETELY CONTINUOUS OPERATORS 

The concept of complete continuity plays an important role in the 
theory of operators and in the application of the theory to many problems 
of hard (as distinguished from soft, or abstract) analysis. 

Definition I : An operator T on a normed linear space is said to be 
completely continuous^' {the terms '"'‘compact" and '‘"totally hounded are 
also used) if for every bounded sequence of vectors fiyf 2 ’,fz'> * . . the corre- 
sponding sequence 7/i, T/s, T/s, • • • contains a subsequence which is 
convergent. 



I7« SKCTRAL THEORY IN HILBERT SPACES 


Examples 

(a) Every operator on a finite-dimensional normed linear space is 
completely continuous. (While this fact should be immediately obvious, 
it is also guaranteed by the corollary to Theorem 3.) 

(b) The zero operator on any normed linear space is obviously 
completely continuous. 

(c) If S' is a subspace of an infinite-dimensional Hilbert space, then 
the projection on S, completely continuous iff S is finite-dimensional. 
(Cf. Exercise 1.) In particular, it follows that the identity operator on an 
infinite-dimensional Hilbert space is not completely continuous. (Cf. 
Exercise 2.) 

(d) An especially important example of a completely continuous 
operator is furnished by an integral transform. Consider the (real or 
complex) Banach space C([a, b]), where a and b are finite. (Of course, it 
is understood that the maximum norm is employed.) Let A (for kernel) 
be a continuous scalar-valued function defined on the square [a, b] X 
[a, b], and for each member / of C([n, b]) let Tf = g, where 

g(^) =j K(x, y)f(y) dy. 

Obviously TfeC{{a,b])\ in fact, this would still be true under the 
weaker assumption that f£L}{[a,b]). Furthermore, T is evidently 
linear, and the trivial inequality 

\Tf{x)\ < {max |/:(x,/)|} ‘ {b - a) • ||/|| 

shows that T is bounded, with norm not exceeding (max |A^(x, y)|} • 
(b — a). Thus, T is an operator. 

Next, choose any two values of x in [a, b], say Xi and jCg. Then 

177(^2) - m^i)l <£ |7 ^(x2,3') - nx„ y)| ■ |/(>-)| dy 

<(b - a) - ll/ll • max |X(x 2 , y) - K(xi, y)|. 

Since K is continuous on a compact subset of the plane, we can, given any 
(5 > 0, find a positive number rj (depending only on 6 and on the given 
kernel K) such that |A(x 2 ,y) — K{xi^y)\ < 6 whenever 1x2 — Xi\ < rj. 
Now let positive numbers e and A be given, let / be subjected to the 
restriction ||/|| < .4, let 6 = €/((^ — a)A), and then choose rj corre- 
sponding to this choice of d. Then, whenever [xg — Xil < rj, the in- 
equality \Tf{x^ — 3y(xi)l < (Z? — a) Ad = c must hold. Hence, the 
family of all vectors {/} whose norms are ^A gives rise to a family of 
vectors {TjT} which are equicontinuous and uniformly bounded. By the 



I. COMPLETELY CONTINUOUS OPERATORS ITS 


AscoU-Arzela theorem it follows that every sequence chosen from the 
collection {Tf} contains a subsequence which converges uniformly (i.e., 
in the norm of C([a, b]). Thus, the operator T is completely continuous. 


Theorem I : Let B be any Banach space. Then 

(a) Any (finite) product of operators is completely continuous if at 
least one of the factors is completely continuous. 

(b) Any (finite) linear combination of completely continuous operators 
is also completely continuous. 

(c) The uniform limit of completely continuous operators is completely 
continuous. (Cf. Exercise 3.) 


Proof; (a) It obviously suffices to confine attention to a product 

of two operators, ST. If T is completely continuous and if/1,/2,/3 is 

any bounded sequence of vectors, there exists a subsequence of the 
sequence J/i, 7/„ r/3, . . . which converges. Denoting this subsequence as 
giy ^3’ • • • obtain - SgJ - \\S(g - ^ J|| < 

||5|1 . iig — 0. Thus the sequence Sgj, Sg^, 5^3 , ... is a convergent 

subsequence of the sequence Sf/j, STf,, STJ,, . . . , and so ST is com- 


pletely continuous. 

On the other hand, if S is assumed to be completely continuous, we 
argue as follows. Given any bounded sequence of vectors/,,/!,/*, • • • , 
the sequence 2/,, T/*, T/*, ... is also bounded (since T is bounded), and 
so it is possible to extract from the sequence 5(7/,), 5(7/2), 5(7/3 ), ... a 
convergent subsequence. Therefore, ST is completely continuous. 

(b) Since any scalar multiple of a completely continuous operator is 

obviously completely continuous, it suffices to prove that the sum of two 
completely continuous operators is also completely continuous, for hni c 
induction then furnishes the desired result. Therefore, suppose that 5 and 
T are both completely continuous, and let /,,/*, /s, • ® ® 

sequence of vectors. Then we can select a subsequence ^,, g*, ga, • o 
thil sequence such that the sequence Tg., Tg*, Tg*, . . • converges. Horn 

^ M/P ran in turn select a subsequence «i, /i2s 

the sequence g„g2,g„... we can m turn seec m 

. . . such that the sequence Sh^, Sin , . . . convcrg 

Tit Th certainly converges, Jt follows that the se 

(s’; r,*r(?+V. . - th.., .h. 
(5+- n/(s + r)/..(s+r)/., . , 

subsequence, and so 5 + continuous oper- 

(c) Let the sequence 5„ Si, 5,. . . • ^"7 ,//•/• be anv 

ators converge uniformly to sequence we extract a 

bounded sequence of vectors. From t <) 

subsequence /ii,/,2>/i3, • • • f such that 

lim <? f exists Clearly, hm„..ao / f 

n-*oo zjzn rtAcifive integer k a subsequence /.xiyka? 

procedure, we obtain for each positive g 



laa SPECTRAL THEORY IN HILBERT SPACES 


... of the original sequence such that exists fory = 1,2, 

Now consider the diagonal sequence /ii,/22»/33» • • •■ This is 
clearly a subsequence of the original sequence fi.fi.fs, ...» and, aside 
perhaps from a finite number (depending on k) of initial terms, it is a 
subsequence of the sequence fjci^fki^fkz^ • • • • Hence, lim„_*«; Skfnn exists 
for each index k. Given any c > 0, we first determine an index k such that 
115 — 5x,l| < e, then an index N such that ^ whenever 

m and n both exceed N, Under the restriction just imposed on m and /z, 
we obtain 


WSfnn - 

= II (‘S/.. - Slf,^ + ~ ~ ^/,nm)ll 

< 11(5 ~ 5x.)/,J| + ||5^/,, ~ + 11(5 - Si)f„,J\ 

< 115 ^ 5^11 • ||/,J| + 115 - 5^11 • WUJ + . 

< 2c sup ll/Jl + 6 = c(l +2 sup ll/JI). 

Since c can be chosen arbitrarily close to zero, it follows that the sequence 

*^22^ *^33' • • • is Cauchy; since B is complete, the sequence is 
convergent, and so 5 is completely continuous. 

Definition 2: An operator defined on any normed linear space is 
said to be degenerate if its range {which is obviously a linear manifold) is 
finite-dimensional. 

Theorem 3: Let T be a degenerate operator on the normed linear 
space N, and let the vectors • • • » gn) constitute a basis of the range 

of T. Then there exist hounded linear functionals {/i, 7^, . . . , such that, 
for every vector f in N, 


Tf=k(f)g, + k{f)g, + • • • + l,Xf)gn 


Proof: For any /, Tf must be expressible (in a unique manner) as 
a linear combination of the g's; the linearity of T clearly implies that the 
coefficients must be (scalar- valued) linear functions of/. It therefore 
remains only to show that the linear functionals /j, /g, . . . , /„ are bounded. 
By Theorem 2-4 of Chapter 4, we are assured of the existence of a positive 
constant 6 such that ||27 |1 > b I4(/)1» so, for each index k, we 
obtain |4(/)1 < (1/^) \\Tf\\. The boundedness of T furnishes the further 
inequality |7fc(/)| < ((l/<5) ||r||) ||/|i, and so the boundedness of 7^ is 
established. 

Corollary: Any degenerate operator on any normed linear space is 
completely continuous. 



I. analysis of a completely continuous operator 181 

Proof. Part (b) of Theorem 1 shows that it suflfices to prove that 
for any bounded linear functional / and any non-zero vector the equation 
Tf=Kf)g defines a completely continuous operator T. Clearly, T is 
bounded and linear. Given any bounded sequence/,. Z,./,,, . . , we form 

the corresponding sequence of scalars /(/t), /(/>). /(/g) Each term 

of the sequence is bounded in absolute value by ||/|] • sup || /Jl, and by the 
Bolzano-Weierstrass theorem it is possible to extract a subsequence /V L 
/g, . . . of the original sequence of vectors such that Iim„ _. , /(/„) exists! 
Then clearly Tf^, {lhTi„_^Qo KJn)}g^ ^*^d so T is completely continuous. 

Theorem 4: Let T be a completely continuous operator on a Banach 
space and let p be a non-zero eigenvalue of T. Then p is of finite degeneracy. 
{The example T = O shows that the non-vanishing of p is an essential 
condition.) 

Proof: Suppose that the manifold of all solutions of the equation 
7/ = /// is of infinite dimension. By taking account of Theorem 2-7 of 
Chapter 4 we easily see that there would exist a sequence of unit eigen- 
vectors /i,/ 2 ./ 3 . ... such that II/: ~/|| > .1 whenever / 5 ^/ Then, 
whenever / 9 ^ j\ the inequality \\Tf — Tf\\ — \\p{f — /)|| > .J \p\ would 
hold, and so the sequence 7/. 7/^ T/g, . . . could not possibly contain a 
convergent sequence, but this would contradict the hypothesis that T is 
completely continuous. 

Exercises 

1. Prove the assertion made in the first sentence of Example (r). 

2. Prove that the identity operator on any infinite-dimensional 

Banach space is not completely continuous. 

3. Prove that pari (c) of Theorem 1 becomes false if “uniform'’ is 

replaced by either “strong” or “weak.” 

4. Give an alternative discussion of Example (d) which is based on 

part (c) of Theorem 1, the corollary to Theorem 3, and the 
two-dimensional version of the Weierstrass approximation 
theorem (cf. Appendix D). 

5. If ris an operator on a Hilbert space, show that T is completely 

continuous iff T* is completely continuous. 


§2. SPECTRAL ANALYSIS OF A HERMITIAN 
COMPLETELY CONTINUOUS OPERATOR 

We saw in the previous chapter that every linear transformation on a 
finite-dimensional complex linear space has at least one eigenvalue. 



182 SPeCTHAL THEORY IN HILBERT SPACES 

Furthermore, under suitable hypotheses it was shown that it is possible to 
form a basis consisting of eigenvectors. However, we encounter difficulties 
immediately when we try to extend these results to infinite-dimensional 
spaces. Exercise 7-3 of Chapter 6 furnishes an example of a hermitian 
operator which possesses no eigenvalues, demonstrating that it is not 
possible to carry over to infinite-dimensional Hilbert spaces the elegant 
and simple description of the structure of Hermitian operators and the 
characterization of their eigenvalues and eigenvectors as solutions of 
certain extremal problems, which were obtained in the finite-dimensional 
case. However, we shall now see that the finite-dimensional theory of 
Hermitian operators does possess a simple generalization to infinite- 
dimensional Hilbert spaces when the additional condition of complete 
continuity is imposed. 

Given on the infinite-dimensional Hilbert space N any hermitian 
operator T (not necessarily completely continuous), we can choose, 
exactly as in the finite-dimensional case, a sequence of unit vectors gi, g 2 , 
^ 3 , . . . such that l(Tg„,gJj ||r||, and since (Tg„,g„) is real we may 
assume further that {Tg^, gj /ij, where jui = ± ||r||. Then 

WTgn - = WTgnV ~ 2^l,{Tg^. g,) + \\gj^ 

< II rr + - 2fi,iTg,. g,) = ~ 2fi,{Tg,. g,) - 0. 

Hence, Tg^ — fx^g^ o. We now impose on T the condition of complete 
continuity and select (if necessary) a subsequence of the g„’s, which we 
denote hi, h^, . . . , such that the sequence Thi, Th^, . . . converges. 
It then follows that the sequence jUi/zi, [^\h:^, • . . converges, and so 
(setting aside the trivial case fii = 0, corresponding to r = O) the 
sequence hi, h 2 , /? 3 , . . . must also converge; we denote the limit of this 
sequence as fi. Then, referring back to the limiting relation Tg^ — fXign 
o, we conclude that Tfi = Since ||/i|| = I, we have shown that 

every operator T which is hermitian and completely continuous and not 
the zero operator possesses a non-zero eigenvalue, in particular ± ||r||. 

The reader should have no difficulty in seeing that, exactly as in the 
finite-dimensional case, we may now restrict Tto the orthogonal comple- 
ment of the one-dimensional subspace spanned byfi and that if Tdoes not 
reduce to the zero operator on the orthogonal complement, a second non- 
zero eigenvalue and a corresponding (unit) eigenvector /2 can be 
obtained; furthermore, < \jUi\. By continuing this procedure as 
long as possible — that is, as long as T does not reduce to the zero operator 
on the orthogonal complement of the eigenvectors already obtained — we 
obtain either a finite or infinite sequence jUi, • • • of non-zero 

eigenvalues and a corresponding sequence fi/f^^fs, . . . of eigenvectors. 

If this procedure terminates in a finite number of steps (say after 
obtaining the eigenvalues jui, ;U 2 , . . . , fXy and eigenvectors /i,/ 2 , . • . »/jv)» 
then every vector orthogonal to these eigenvectors is annihilated by T. 



2 . analysis of a completely continuous operator 183 

By choosing an orthonormal basis in the subspace of vectors orthogonal 
to the/*’s (1 < fe < A/) and combining the two collections of orthonormal 
vectors, we obviously obtain an orthonormal basis of the entire Hilbert 
space. Furthermore, every vector/can be expressed in the form 

/ ~ 2 + ./, 

A - 1 

where Tf = o, and so 

T’/ = i(/./.)T/,=i/, ,(/,/,)/„. (2.-1) 

<■"'1 Jc 1 

On the other hand, if an infinite sequence //,, fu, . . . of non-zero 
eigenvalues is obtained, it may still occur that the corresponding eigen- 
vectors/,,/ 2 ,/ 3 , ... do not span the entire Hilbert space. (Cf. Exercise 
4.) However, whether or not the eigenvectors /./a./j, . . . span the 
space, it is easily seen that the obvious generalization of ( 2 - 1 ) holds: 

Tf ~ ^ (7.2) 

Next, we point out that, when an infinite sequence of non-zero 
eigenvalues is obtained, the limiting relation limji...,,, /ij. = 0 must hold. 
The proof is very simple. Since the procedure employed guarantees that 
the chain of inequalities |//-i| > > I//3I > ■ • • must hold, the quantity 

lim;^^^ must exist and be non-negative. If the limit were a positive 
number, say a, then the following inequality would be valid for any two 
distinct indices j and k (since (fj,fk) = 0): 

WTfj - Tf,f = |!/i,/y -- fij.f = /i) + /il > 2 ol\ 

Thus, the sequence T/i, 7/., T/g, . . . could not contain a convergent 
subsequence, contradicting the hypothesis of complete continuity. 

We now turn to the task of generalizing equations (5“7) and (5-8) of 
Chapter 7. For any real number 2 we define as the subspace spanned 
by the eigenvectors associated with the eigenvalues ot T which are 
It is readily seen that is finite-dimensional if A is negative and that 3)1^ 
is finite-dimensional if A is positive; in particular, = {0} ifA < — IITH 
and = // if A > |!r||. We then define as the projection on 
Exactly as in the finite-dimensional case, we refer to {P^} as the resolution 
of the identity associated with the operator T, and with very little effort 
(cf. Exercise 7) we can justify the following formula for (Tf,^) as a 
Stieltjes integral, for any vectors /and g in H: 

(r/,g)==f Ad(Fj;g). 

J— 00 


(2-3) 



m SPECTRAL THEORY IN HILBERT SPACES 

White (2-3) is identical in appearance with (5-7) of Chapter 7, it should be 
emphasized that the integral appearing in the latter equation is, in reality, 
a finite sum, while the integral in (2-3) is, except in trivial cases (namely, 
when the operator T is degenerate) an infinite series. Now, we are 
justified in rewriting (2-3) in the following form, identical in appearance 
with (5-8) of Chapter 7: 

T=J_JdP,. (2-4) 

The equation (2-4) (which is actually nothing but a reformulation of 
(2-2)) is known as the spectral representation of the operator T, and the 
fact that corresponding to every hermitian completely continuous operator 
T there exists a resolution of the identity such that the representation 
(2-4) holds is known as the spectral theorem (for such operators). 

We conclude this section by indicating, very briefly and imprecisely, 
how this restricted version of the spectral theorem can be extended. By a 
much more delicate argument than that which we have employed, it can 
be shown that, even without the hypothesis of complete continuity, there 
may be associated with any hermitian operator T a resolution of the 
identity such that (2-3) and (2-4) hold. However, in the completely 
continuous case the points of increase of (the numbers A such that 

c: whenever // < A < r) are isolated, with the possible exception 

of zero, while in the general hermitian case the set formed by these points 
(which, as might be expected, coincides with the spectrum) may have a 
much more complicated structure. Exercise 8 provides a simple, but not 
entirely trivial, example of the spectral representation of an operator which 
is hermitian but not completely continuous. 

Once the spectral theorem is developed for hermitian operators, it is 
comparatively easy to develop a spectral representation for a normal 
operator as an integral over the complex plane. 

An excellent presentation of the Spectral Theorem will be found in 
[Riesz and Sz.-Nagy, Chapter 7]. 

Exercises 

1 . Let r be an operator on a Banach space. The scalar A is said to be 

an approximate eigenvalue of the operator T if A is not an 
eigenvalue of T but if, for every positive number e, there 
exists a non-zero vector / such that 117/ — A/|| < e ||/||. Show 
that, for the operator T appearing in Exercise 7-3 of Chapter 
6, every number in the spectrum is an approximate eigenvalue. 

2. Let T be the shift-operator on the Hilbert space /^, defined as 

follows: For every vector {a^, a^, a^, . . .) in P, 


^ 2 > ^ 3 > * • •) — ^ 2 > ^ 3 » • • •)• 



3. THE FREDHOLM ALTERNATIVE |85 

Prove that the number zero belongs to the spectrum of r but 
that It IS neither an eigenvalue nor an approximate eigenvalue 
of T. 

3. In contrast to the preceding exercise, show that every number in 

the spectrum of a hermitian operator is either an eigenvalue 
or an approximate eigenvalue. (Note that the shift-operator 
is not hermitian; what is its adjoint?) 

4. In the Hilbert space /“, give examples of hermitian completely 

continuous operators T, satisfying the following 

conditions: 

(a) Ti has only a finite number of non-zero eigenvalues. 

{b) 7*2 has an infinite number of non-zero eigenvalues, and the 
corresponding eigenvectors span 7. 

(c) Tg has an infinite number of non-zero eigenvalues, and the orthog- 

onal complement of the subspace spanned by the corre- 
sponding eigenvectors is finite-dimensional. 

(d) has an infinite number of non-zero eigenvalues and the 
orthogonal complement of the subspace spanned by the corre- 
sponding eigenvectors is infinite-dimensional. 

5. Show that the procedure described in the text furnishes all the 

non-zero eigenvalues with the correct multiplicity. 

6. Prove that the spectrum of a hermitian completely continuous 

operator (on an infinite-dimensional Hilbert space) must 
contain the number zero. If the latter is an isolated point of 
the spectrum, it is an eigenvalue, but if it is not isolated it may 
be either an eigenvalue or an approximate eigenvalue. 

7. Justify equation (2-3). 

8. Let T be the operator appearing in Exercise 7-3 of Chapter 6 

and let / 7 ^( v), for each real number X and each number .v* in 
[0, 1], be defined as follows: p^(x) = 0 if A < 0, p^ix) s I 
if A > I, and p^ is the characteristic function of the interval 
[0, A] if 0 < A < I . LtiPx denote the operator (on LH[0, 1])) 
consisting of multiplication by px^ Show that {Px) is a resolu- 
tion of the identity on L2([0, 1]) and that (2-3) (and hence 
(2-4)) holds. (Note that the points of increase of [Px) constitute 
the entire interval [0, 1], which is precisely the spectrum of 7.) 

§3. THE FREDHOLM ALTERNATIVE 

As in the preceding section, let 7be a hermitian completely continuous 
operator, let A be a given scalar, and let ^ be a given vector. Consider 
the problem 


/ = S' + 


( 3 - 1 ) 



IM SPECTRAL THEORY IN HILBERT SPACES 


(We shatl see in the following section a particularly important example of 
such a problem.) Taking account of (2-1) and (2-2), we see that (3-1) can 
be rewritten as follows : 


/ = g + 2 (3-2) 

k 

the summation being performed over all the non-zero eigenvalues of T. 
If we now replace / on both sides of (3-2) by the right side of this equation, 
we obtain (by employing due caution with the indices) 

g + Z = g + Z Z ^^f^j/^k(fJi)iUfk)fk + Z Vifc(g-A)A- 

k k 0 k 

(3-3) 

This simplifies, because of the orthonormality of the eigenvectors, to 

Z ¥kifJk)fk = Z {^YkifJk) + %(g,A))A- (3-4) 

k k 

Since the eigenvectors are linearly independent, we obtain 


Vfc(( - ^^^k)ifJk) = ^f^kigji)- (3-5) 

Now, none of the /^s vanishes, and we may assume that 2 5*^ 0, for 
otherwise (3-1) becomes utterly trivial. Hence, we may cancel out the 
factor from both sides of (3-5), obtaining 


(1 - 2/^,)(/,/,) = igj,), (3-6) 

Now, there are two possibilities: 

(a) 1 — never vanishes — i.e., 2 is not equal to the reciprocal of 
any eigenvalue of T. In this case, (/, /^.) must equal, for each index k, the 
quantity (g,/fc)/(l — so the only conceivable solution of (3-1) 

is obtained by inserting into (3-2) the expression just obtained for (/, f^. 
We thus obtain for the sole possible solution the formula 


/=g + Z— (3-7) 

fc I - AfXj, 

(b) 1 — A/Xfc vanishes for one or more indices k\ as we have seen 
previously, this can occur for only a finite number of indices. In this case, 
we see from (3-6) that there can be no solution of (3-1) unless (g,/^) 
vanishes for all the eigenvectors associated with the eigenvalue 1/A; if 



3. THE FREDHOLM ALTERNATIVE 


187 


this condition is satisfied, then (3-6) imposes no restriction on the quan- 
tities (/,/t), and so we are led, by again referring to (3-2). to the following 
formula for the most general possible solution of (3-1): 


/ = g + 2 



(38) 


where the scalars are completely arbitrary. 

It still remains to show that the formulas (3-7) and (3-8) are meaning- 
ful. If T possesses only a finite number of non-zero eigenvalues, no 
problem of convergence arises, and by direct substitution into (3-1) we sec 
that (3-7) or (3-8) does indeed provide a solution of (3-1). On the other 
hand, if T possesses infinitely many non-zero eigenvalues, the fact that 
lim^-^^/z^b =0, which we proved in §2, enables us to assert that the 
inequality < | holds with only a finite number of exceptional values 
of k (if any). Thus, setting aside these exceptional values of A', we conclude 
that the coefficient of appearing in (3-7) or in the infinite summation in 
(3-8) is dominated in absolute value by J |(^,/^.)|/(1 |), or \{g,fk)\‘ 

Since, by Bessel's inequality, \(g^fk)\^ is convergent, the aforementioned 
infinite sums do converge (in the norm of the Hilbert space under con- 
sideration), and hence the right sides of (3-7) and (3-8) are indeed 
meaningful. Then, as in the case when T possesses only finitely many 
non-zero eigenvalues, we easily confirm that (3-7) or (3-8) does provide 
a solution of (3-1). Thus, we have established the following result. 


Theorem I : IfT is a hermit ian completely continuous operator (on 
an infinite-dimensional Hilbert space) and if X is not equal to the reciprocal 
of any of the eigenvalues of T, then for each vector ^ the equation (3-1) 
possesses one and only one solution, given by (3-7). If is an eigenvalue 
of T, then the equation (3-1) is solvable iff g is orthogonal to the subspace 
SPt (necessarily finite-dimensional) spanned by the eigenvectors associated 
with the eigenvalue 1/2 (in other words, iffg is orthogonal to all solutions of 
the equation Th = (l/A)/z). If g satisfies this condition, (3-1) possesses 
infinitely many solutions; exactly one of these is orthogonal to 'iUt, and the 
general solution is obtained by adding to this particular solution an arbitrary 
vector contained in 93? . 

If the hypothesis that T is hermitian is omitted, the preceding 
argument becomes completely inapplicable; nevertheless, the very 
remarkable fact holds that Theorem I can be recast into a form which is 
valid without this hypothesis. We shall not present the proof, but shall 
merely remark that the proof is obtained by beginning with the observation 
that the theorem is known by linear algebra to be true in the hmte- 
dimensional case and then reducing the infinite-dimensional theorem to 



188 SPECTRAL THEORY IN HILBERT SPACES 


the finite-dimensional case by approximating the operator T with 
sufficient accuracy by a degenerate operator. 

Theorem 2 (Fredholm Alternative in Hilbert Space): Let The 

completely continuous. The equation (3-1) is uniquely solvable for every 
vector g iff the equation f = XTf (corresponding to g = o) possesses only 
the trivial solution f = o. This condition, in turn, is equivalent to the 
condition that the adjoint homogeneous equation f = KTff shall possess 
only the trivial solution. 

If the equation f = XTf possesses non- trivial solutions, so does the 
equation f = lT\f, and the solutions of these two equations form linear 
manifolds, and 5R*, of the same (finite) dimension. In this case the 
equation (3-1) is solvable iff g is orthogonal to when this condition is 
satisfied, (3-1) possesses infinitely many solutions, exactly one of which is 
orthogonal to 3[lt, and the general solution is obtained by adding to this 
particular solution an arbitrary member 

Finally, we discuss briefly the remarkable fact that a major part of 
this theorem can be carried over to Banach spaces. 

Theorem 3 (Fredholm Alternative in Banach Space): Let T 

be completely continuous. The equation (3-1) is uniquely solvable for every 
vector g iff the equation f = XTf (corresponding to g ^ o) possesses only 
the trivial solution f = o. If the latter equation possesses a non-trivial 
solution, then there exist vectors g for which (3-1) is not solvable. 

(Note that the first two sentences of this theorem and of Theorem 2 
are identical.) 

As might be expected, the proof of this theorem is somewhat more 
subtle than the proof of the corresponding portion of Theorem 2, since 
much of the structural simplicity possessed by Hilbert spaces (centering 
around the concept of orthogonality) is lost is making the transition to 
Banach spaces. Nevertheless, it is even possible to extend Theorem 3 so 
as to provide analogues of the remaining portions of Theorem 2. As 
might be expected, the analogues will involve the adjoint operator J*, 
which operates on the dual space rather than on the original Banach 
space. 

Exercises 

1 . Use formula (3-7) to show that when X is not the reciprocal of any 

eigenvalue of T, the transformation (I — XT)~^ is bounded. 
(Of course, this is also guaranteed by the second corollary to 
Theorem 3-1 of Chapter 6.) 

2. Show by means of a simple example that none of the theorems 

stated in this section remains correct if the condition of complete 
continuity is omitted. 



4. SURVEY OF THE FREDHOLM THEORY 189 

§4. SURVEY OF THE FREDHOLM THEORY 
OF INTEGRAL EQUATIONS 

Fredholm s own discovery, which was made early in this century 
antedates the development of the theory of Hilbert and Banach spaces! 
The major part of his theorem, as he formulated it, goes as follows; If 
[a, h] is a compact interval, if g is a (real-valued or complex-valued) 
continuous function on [a, A], if K is a (real-valued or complex-valued) 
continuous function on the square [a, A] x [a, i], and if the only con- 
tinuous solution of the equation 

f{x) = I K(x, y) f(y) cly (4-1) 

Jn 

is the trivial one, /(a) = 0, then the equation 


f(x) = g{x) + J K(x, >’)/( V) cly (4-2) 

possesses a unique continuous solution. 

By referring to Example (J) of §1 , we immediately sec that Fredholm’s 
theorem is contained as a particular case of Theorem 3-3. Needless to say, 
Fredholm’s remarkable discovery served as a major stimulus to the line of 
abstract development which culminated in the proof of Theorem 3- 3. 
It should, however, be pointed out that the proof of Theorem 3-3 does 
not run parallel to Fredholm's own argument. Briefly, Fredholm 
partitioned the interval [a, b] into subintervals of equal length and 
approximated the integral appearing in (4-2) by a sum; he was thus led 
to approximate (4-2) by a finite system of linear algebraic equations, and 
then with remarkable skill he carried out a passage to the limit in which the 
number of subintervals increased without bound. In particular, he dealt 
with determinants (via Cramer’s rule) of increasing size. The proof of the 
abstract theorem makes no reference to determinants, or even to the 
finite-dimensional case; the theory of (finite) systems of algebraic 
equations drops out as a particular case of the general theorem, as does 
Fredholm’s own theorem. 

We now discuss briefly and heuristically how the Hilbert-space 
formulation of the Fredholm alternative applies to integral equations. A 
rigorous discussion must be based on the theory of Lebesgue integration 
in the plane, which we have only sketched briefly in Chapter 2, In partic- 
ular, as will be seen in the following discussion, it is necessary to appeal 
frequently to the theorem of Fubini. 

Let the function K be measurable and quadraticaliy integrable on the 
square (a, b) x (a, b), which we denote for brevity by D. (The interval 



m SKCTIIAL THEORY IN HILBERT SRACES 


(a, b) may be infinite in either or both directions, in contrast to the Banach- 
space formulation of the Fredholm theory.) Then for any function / 
belonging to L*((a, b)) the integral K{x,y)f(y) dy exists for almost all 
X in (a, b) and the function thus defined, which for obvious reasons is 
denoted K/, is also a member of L\{a^ b)). Furthermore, by using the 
Schwarz inequality in combination with Fubini’s theorem, we can show 
that ||K/P < {JJij \K\^} ll/ll®. Thus, the kernel K gives rise to a bounded 
linear transformation K (i.e., an operator) on L\{a, b)). By suitably 
approximating the kernel K with polynomials, which are easily seen to 
give rise to degenerate operators, and employing part (c) of Theorem 1-1 
in combination with the corollary to Theorem 1-3, we then see that the 
operator K is completely continuous, and so Theorem 3-2 may be applied 
to the integral equation 

/(x) = g(x) + K{x, y)f(y) dy, (4-3) 

where the given function g and the unknown function / are to belong to 
ma, b)). 

Next, it is almost obvious (cf. Exercise 4-3) that the operator K is 
hermitian if the kernel K is hermitian — that is, if K{x,y) = R{y, x) for 
almost all points (x,y) in the square D. (The converse is also true.) 
Now Theorem 3-1 and all of §2 are applicable. In particular, if the 
operator K corresponding to the kernel has only a finite number of non- 
zero eigenvalues . , , , (each appearing as frequently as its 

multiplicity), with corresponding orthonormal eigenvectors (which are 
now termed eigenfunctions) /i,/ 2 , . . • ,/iv» we readily see from the 
developments of §2 that the kernel (also obviously hermitian) 

R{x, y) = K(x, y) - j,fiMx)fJy) (4-4) 

k^l 


must give rise to the zero operator. Now, as might be expected (although 
we do not prove it here), this implies that R(x,y) = 0 almost everywhere 
in /), and so from (4-4) we obtain the remarkable bilinear expansion 

K(x, y) = J,fiJ,^x)fJy) a.e. (4-5) 

fc=l 

(CE Exercise 4-4.) On the other hand, if K possesses infinitely many 
non-zero eigenvalues pi, Pz, . . . (arranged in descending order of absolute 
value, as in §2), it is clear that, for any positive integer «, the kernel 


n 


K{x. y ; n) = X(x, y)-^ /^.fMUy) 


(4-6) 



4. SURVEY OF THE FREDHOLM THEORY 191 

gives rise to an operator K„ whose norm is precisely so that ||K„|| -*• 

0. This property of the operators K„ suggests very strongly that a corre- 
sponding property must hold for the kernels K(x, v;n)— i.e., that as n 
increases without bound the limiting relation 


JJ \K(x, y; n)|^ >•) -IthfMMy) 

D D 


0 


(4-7) 


must hold. This assertion is indeed true, so that we may write the following 
generalization of (4-5): 


00 

y) = 2 fitfkMfly). (4-8) 

However, it must be emphasized that (4-8) does not assert that the series 
on the right converges pointwise to the left side, but only that the partial 
sums of the right side converge to the left side in the norm of L\D). 
However, as might be expected, there are many important particular 
kernels for which (4-8) holds in the sense of pointwise convergence as 
well as in the sense of convergence in norm. 

We conclude our discussion of the expansion theory of hermitian 
kernels at this point, but we strongly advise that the reader pay careful 
attention to Exercises 5 through 9, in which some of the ideas just 
presented are pushed further. 

Exercises 

1. Let K be continuous on the compact square [a, h] x [a, h] and 

suppose that |A| • {b — a) • max |/((a', v)| < 1. Prove that the 
integral equation (4-2) possesses a unique continuous solution 
f for each continuous function g. 

2. Let K be defined and continuous on the closed triangular region 

bounded by the x-axis, the line a = /, and the line a = a, 
where a is a positive constant. Show that the integral 
K(x,y)f(y) dy defines a completely continuous operator K 
on C([0, a]) and that the integral equation 


fix) = g(A) -f 


aJJkCx, y)f(y)dy 


possesses a unique continuous solution in C([0, a]) for each 
g in C([0, a]) and for any value of L (Hint: Prove that for 
some sufficiently large integer n the norm of the operator K 
is less than unity.) Equations of this type are known as Volterra 
equations. 



192 


SPeCTRAL THEORY IN HILBERT SPACES 


3. Prove that a hermitian (quadratically integrable) kernel gives 

rise to a hermitian operator on L^((a, b)). 

4. Let K(x, y) = cos (x -f y) on the square (0, tt) x (0, tt). Show 

that the corresponding operator K has only two non-zero 
eigenvalues (each non-degenerate). Determine these eigen- 
values and the corresponding unit eigenvectors (eigenfunctions) 
and confirm that (4-5) holds in this case. 

Note: The remaining exercises are to be solved in the sense of 
resorting to formal manipulations whose rigorous justification would 
necessitate an adequate command of the theory of Lebesgue integration 
in the plane. 

5. Show that if K(x, y) and L{x, y) belong to L^(D), then the integral 

K(x, t)L(t, y) dt defines a kernel M{x, y) which also belongs 
to L^(D), and that the corresponding operators K, L, M 
satisfy the relation M = KL. Furthermore, show by means 
of a specific example that hermitian kernels K(x, y) and L(a% y) 
may furnish a non-hermitian kernel M{x,y). 

6. If K(x, y) is hermitian and belongs to L^{D), show that, in analogy 

with (4-8) (or (4-5)) the relation 


>’) = 'f, /^lUx)My) (4-9) 

fc -1 

holds, where A^( 2 )(v, y) = K{x, t)K(t, y) dt. More generally, 
show that, with an obvious appropriate definition of A'(„)(a% y), 
the relation 


or 00 

>”) = Z H-kfk(x)h(y) (4-10) 


holds for every positive integer n. 

7. Employing the notation introduced in the preceding exercise, 
show that 

f ^ N or 00 

I (4-11) 

JJ k=l 


8. Let K{x,y) = min {x,y} • [1 — max {x, y}] on the square (0, 1) x 
(0, 1). Show that the eigenvalues of the corresponding 
operator are all simple and positive, being given by the formula 


= (/c= 1,2,3,...), 

rC TT 


(4-12) 



5. ESTIMATION OF THE LARGEST EIGENVALUE 193 


while the corresponding unit eigenvectors are given by the 
functions 

f^{x) =^/2smknx. (4-13) 


9. Work out the numerical value of the left side of (4~11) for the 
cases n = 1 and « = 2, where K{x, y) is the kernel appearing 
in Exercise 8. Derive from your calculations the identities 


yl==^ = 

a ~ 90 ’ fe* 9450 ■ 


(4-14) 


§5. ESTIMATION OF THE LARGEST EIGENVALUE 

From equations (2-1) and (2-2) it is almost immediately evident that 
the discussion of Kellogg’s method which was presented in the preceding 
chapter carries over with no essential changes to the infinite-dimensional 
case, except that we must now impose on the hermitian operator T the 
condition of complete continuity. We illustrate this procedure with the 
operator determined by the kernel K{x,y) defined in Exercise 4-8. 
According to Exercise 1, the norm of the operator K determined by this 
kernel is certainly an eigenvalue, and the (essentially unique) eigenfunction 
fi corresponding to this eigenvalue may be taken to be everywhere real 
and non-negative, so that any non-zero constant function (which we may 
taketobefs 1 ) is certainly not orthogonal to/,; since- 1|K11 is certainly 
not an eigenvalue (again according to Exercise 1), we may te assured that 
the sequence of numbers (K’-^/,/)/(K»/,/) converges to ||K1I, while he 
positive character of the operator K (cf Exercise 4-8) guarantees the 
monotone convergence of the aforementioned sequence to |K . 

Now, by elementary calculations (which the reader should carry out 
for his own benefit) we obtain, with the aforementioned choice of the 
function/, the following results: 

K/(X) = iix - X% r/(x) = .',(X - 2X» -H X*). (5-1) 

K3/(x) = Tk(3x - 5x* -f 3x* - x«). 


Next, we easily obtain 

(/, /) = 1 , (K/,/) = rV , (K%/) = rb . 

(Ky,/) = ioTiio- 

From these results we obtain, in turn, 

(K/,/) 1 (J^ = i = 

(/,/) 12’ (K/,/) 10’ (Ky,/) 168 


(5-2) 


(5-3) 



t94 snCTHAL THBORY IN HILBBKT SPACES 


and we observe that these ratios are increasing. It is interesting to note 
that (ll‘rT*)l(^) = 1.00129 .... so that in only three stages the Kellogg 
procedure has furnished an estimate for the largest eigenvalue \Vhich is 
accurate within almost 0.1 percent, despite the fact that the constant 
function / is a rather poor approximation for the eigenfunction corre- 
sponding to the largest eigenvalue. 

Continuing our calculations a little further, we obtain 


IIK/II 


= V30(x - X*), 


Ky(x) _ Im 

IIKyil V 31 


- 2x® -I- x^). 


Ky(x) 

iiKyii 


12012 

5461 


(3x - 5x® + 3x^ - X*), 


(5^) 


and also, recalling that the leading eigenfunction /j is given by /i(x) = 
'Jl sin TTX, we find, by a tedious but elementary computation, that 


or 


Ky _ 
iiKyii 



/6006\’/®^ 

\546lj 


0.20924 • 10-®, 


(5-5) 


Ky _ 
iiKyii 


0.4574 • 10-®. 


(5-6) 


Now, instead of choosing a trial function at random, as we did 
previously, we turn to the problem of making an optimum choice of the 
trial function within a specified class of candidates. From elementary 
considerations of symmetry, it is clear that the (essentially unique) 
eigenfunction of the kernel under discussion here must be symmetric with 
respect to the point i; i.e., /i(J q- x) = /i(J — x). Let us therefore 
consider all quadratic polynomials which possess this symmetry property. 
Since the unique critical value of such a function must occur at J, we see 
that it must be of the form a + bx — bx^. An elementary calculation 
furnishes the result 


(K/,/) ^ 420a" -I- 17b® -|- 168ab 
(J,f) 168(30a" -f b* -h lOoh) ’ ^ ' 

or 

(K/,/) ^ 420 -I- 168c + 17c" ^ ^ b , 

(/,/) 168(30 -I- 10c -f- c") ’ a' ^ 

A simple calculation shows that this fraction* is maximized by choosing 


* Fractions of the form (K/*, /)/( /,/) are frequently termed Rayleigh quotients, 
particularly when they are employ^ in estimating eigenvalues. 



5. ESTIMATION OF THE LARGEST EIGENVALUE 


m 


for c the value —45 — V 1605, and that with this choice of c equation (5-8) 
assumes the form 


(K/J) 

ifj) 


4815 -f 107^1605 
89880 


= 0.1012648- • • . 


(5~9) 


Taking account of the characterization of A, as the maximum of the 
fraction (K/, /)/(/,/) for all possible choices of/in L-^KO, 1)) (subject, of 
course, to the restriction ||/|| > 0), we conclude that 


Ai > 0.1012648 • • . (5~10) 

It is interesting to note that A^ = I/tt^ = 0.101321 , so that consider- 

ation of a very simple class of trial functions has led us to an estimate for 
the largest eigenvalue which is in error by only 0.05 percent, a result 
better than that obtained with three stages of the Kellogg procedure 
(beginning, however, with a less judicious choice of trial function). 

The method of employing a family of trial functions containing one 
or more parameters and making an optimum choice of the parameter(s) 
is known as the Rayleigh-Ritz procedure, and it can often be used very 
effectively in estimating the highest eigenvalue (and, with suitable modi- 
fications, lower eigenvalues) of hermitian kernels. 

Finally, we remark that, once the Rayleigh-Ritz procedure has 
furnished a satisfactory trial function, the latter may be eflectively 
employed in the Kellogg procedure. 


Exercises a ^ 

1 Let K(x,y) be real-valued, quadratically integrable, and sym- 

metric x) a.e.) on a square (a, ft) x (a. ft), 

and suppose furthermore that K(x,y) is positive everywhere 
(or almost everywhere) on the square. Show that ||K11 is an 
eigenvalue of the corresponding operator K, that thi^s eigen- 
value is simple, that associated with this eigenvalue there is a 
non-negative eigenfunction, and that - liK|| is not an eigen- 
value. (Hint: Note the strong resemblance to Exercise 3-2 

of Chapter 7.) 

2 Let f(x) = r{l - (45 + Vl605)(x - where the real number 

i is to i chosen so that ||/(x)|| = 1. (Note that the sign of y 
is left ambiguous.) Compute ||V2 sin nx ~f(x)h where the 
gn of Ti? chosen so as to make this norm close to zera 
(Clearlyf the opposite choice of y will make the norm close 

to 2.) 




APPENDICES 


A. PARTIALLY ORDERED SETS 
AND ZORN’S LEMMA 

Let A be any non-empty set and let P be any non-empty subset of 
A X (i.e., P is a relation on A). If the elements x and / of A are such 
that the ordered pair (jc, y) belongs to P, we shall write xPy. (In particular, 
P may coincide with A x A, in which case the statement xFj is true for 
every choice of x and y.) The elements x and y (not necessarily distinct) 
are said to be comparable if at least one of the statements xPy^ yPx is true; 
otherwise they are said to be incomparable. The relation P is said to be a 
partial ordering (of A) if the following three conditions are satisfied: 

(a) xPx for every x (axiom of rejiexivity); 

(b) If xPy and yPx, then x = / (axiom of symmetry)’, 

(c) If xPy and yPz, then xPz (axiom of transitivity). 

The simplest and most important example of a partial ordering is the 
relation < on the real number system R. (For this reason, it is customary 
to use the symbol < for any partial ordering, but we prefer to employ the 
non-committal symbol P.) Note carefully that in this case any two elements 
are comparable. On the other hand, if we still consider R but interpret 
xPy to mean that j ~ x is a non-negative rational number, then P is 
indeed a partial ordering, but the numbers 1 and y/l are incomparable. 

The set A, together with the partial ordering P, is termed a partially 
ordered set, or poset, (Strictly speaking one should describe a poset as a 
pair of objects {A,P}, but, as in many analogous situations, we speak 
simply of the poset A , suppressing reference to the partial ordering P once 

197 



200 APPENDICES 


10. Prove that at least one maximal ideal exists; in fact, show that 
any proper non-maximal ideal is contained in at least one 
maximal ideal. (Hint: Introduce a suitable partial ordering 
into the class of all proper ideals and show that the hypothesis 
of Zorn’s lemma is satisfied.) 


B. CONCERNING THE SPECTRUM OF AN 
OPERATOR ON A COMPLEX 
BANACH SPACE 


Let T be any operator on the complex Banach space B. We wish to 
show that (j{T) is not empty — i.e., that there exists at least one complex 
number p such that {pi — does not exist. Suppose, on the contrary, 
that {pi — T)~^ exists for all p. Then, in particular, exists, and 

so, if /is any non-zero vector, chosen once and for all, (— r)“yis a well- 
defined non-zero vector, and so there exists a bounded linear functional / 
on B such that the function g(p) = l{{pl — T)"/) is defined for all 
complex numbers p and does not vanish when p = 0. Now, for \p\ > || T\\ 
we may write 


1 T 


p p^ p^ 


and so (cf. Exercise I) 


g(//) = /A/ + \ Tf + 1 ry + • ■ •) 

\p p^ p' / 


Therefore, 


= - /(/) + KTf) + \ ktY) + ■■■. 

/.i 


ig(/«)i < r, + ,-T2 + ,-73 + 

I/^I I/«I 


ll/ll . IITII- ll/ll \\T^ 

1^1 liUp I//I® 


li/ll 


+ 


ll/ll • ll/ll 

1/W| 


, , lirii II rf 


II <11 • ll/ll 
l/*l - II Til ■ 


Thus, |^(/i)l approaches zero uniformly as |,m| -» oo. 



c. THE STIELTiES INTECRAL 201 


Next we observe that for any complex numbers 6 and /i, 
gin + d)- g(/j.) = /(((/< + d)i - D Y) - - r)-*/) 

= /(((// + b)i - r)-y- (jii - 7)1/) 

= /({(;</ - T)(l - 6{nl - 7)- ■)}->/- (ftl - 7^-y). 

Fixing fi, let 0 < 1<5| < ||(/</ - 7)-‘l|. Then we may, by Theorem 7-3 
of Chapter 6, write 

g(n + (5) - gifx) 

= /({(/// - r)-i + 6(tii - 7) 2 + - ry^ + ••■}/- oj - 7 ) */) 

= - r)-y) + mill - 7) y) + • ■ • . 


Hence, 

g(^ + 


s(/^) 


/((/I/ - TY'i) 


< II /II I i<^i" m - 7’r'f ll/ll 

// 1 

= ll/ll ■ ll/ll ■ iK/^/ - T^r'ir^-i-^i 

1 - i^i' iK/i/ - 7r‘ii 


Thus, limj^o exists for every // (and equals 

linl - T)-J)). Therefore g(/<) is an entire function which approaches 
zero as |/r| — > oo. By Liouville’s theorem, gilA = 0, contrary to the 
assumption that ^(0) 0. Hence, (f(T) cannot be vacuous. 

Note that, in assuming the existence of a non-zero vector^, we have 
tacitly excluded the trivial case that the space B consists only of the zero 
vector. 


Exercise 

1, Justify the term-by-term application of / to the various infinite 
sums which appear in the argument. 


C. THE STIELTJES INTEGRAL 

Let f and g be real-valued functions defined on the finite closed 

interval [a, i»],/being continuous andgnon-decreasing (but not ncMSsanly 

continuous). Let H be any partition of [a, ft]— i.e., an ordered set ^ 
numbers {xo, x^, x^, . . . , xj such that <> = < Xi < x^ < ■ x„ 

ft, linil, the norm of H, being defined as the largest of the numters x* 
xj , U /t < «. Let numbers f. - • • , f n be ‘chosen arb.tranly m 
the intervals [x„ x,], [x., x,] [x„-„ xJ respectively. Jbe f 'ema".^- 



202 APKNDICES 


uniformly continuous, it is possible, given any positive number €, to find 
a positive number 6 such that \f{x) — f{x')\ < €/(g(6) — g{a)) whenever 
|jc — x'\ < d. Hence, whenever ||n|| is less than 6 , replacing the numbers 
fi> f 2 » • • • » fn another admissible set of numbers f 
alters the Riemann-Stieltjes sum by less than for the absolute value of 
the difference of the two sums is at most “-/(ffc)! — 

gixi,_i)), which is less than (eligib) - gia))) (g{x^ - ^(xt_i)) = 
g(^)))(g(^) ““ gi^)) = (The monotonicity of g is employed 
here, of course.) Then, exactly as in the development of the theory of the 
Riemann integral, it follows that if we choose a sequence of partitions 
whose norms approach zero and form a corresponding Riemann-Stieltjes 
sum for each of the partitions, the sums converge to a limit which is 
independent of the choice of the partitions and of the f’s within each 
subinterval of each partition. This limit is thus completely determined by 
the functions / and g, and is denoted J*/ dg. (In particular, if g(x) = x, 
Saf ^g obviously coincides with the Riemann integral j^^fix) dx.) 

Next, let gi and g 2 be two non-decreasing functions. It is easily seen 

that 

f /^(gi + g2) = Cfdg, + (C-1) 

Ja Ja Ja 

If h is expressible as the difference of two non-decreasing functions, say 
h z=z g^^ gg, and also as the difference of two other non-decreasing 
functions, h = g^ — g^, we obtain from (C-1) the equality 

f /dg, + f / di. = f / dg , + r / dg,. (c-2) 

Ja Ja Ja Ja 

From (C-2) we obtain, in turn, 

rb i^b rb 

fdg, - fdg2 = fdg, - fdg2, (C-3) 

a Ja Ja Ja 

Thus, we are justified in denoting the common value of both sides of 
(C-3) as dh. The reader is perhaps acquainted with the fact that the 
(real-valued) function h is expressible as the difference of two non- 
decreasing functions iff h is of bounded variation — by this we mean that 
there exists a positive number C such that for every partition 11 of the 
interval (a, b\ the sum |/ir(x*.) — /i(.x:jfc_i)| does not exceed C. 

Now let / and h be complex-valued functions,/ = /i + if^, /z = /ii 4* 
ih^y and suppose furthermore that and /g are both continuous and that 
and are both of bounded variation. Then, as is to be expected, we 
define JJ / dh as follows: 

[f dh = f A dh, - f A dh^ + i Cf, dh2 + i f /2 dh,. (C-4) 

Ja Ja Ja Ja Ja 



O. THE WEIERSTRASS APPROXIMATION 


THEOREM 203 


Finally, it is almost trivial to show that Stieltjes integration is linear-i c 
if/ and Fare continuous and h and H satisfy the preceding conditions.’ 
then forany scalare «i, a^, /Ji, the integral + 8,h) 

is well denned and satisfies the equality 


J («/i + ajF) d(fiih + /SjH) = ai/3,J[ /d/i 

+ ct-i^ijjdH + aj^iJ F dh + F dH. (C-5) 

In particular, it should be noted that if /j is a step-function having 
discontinuities only at the points ♦ • • ? inside the interval [a, 6], 

then Slfdh assumes the simple form * {hirjk + 0) - //(^^ - 0)}! 

Exercise 

1. Suppose that both / and h are real-valued, continuous, and non- 
decreasing. Prove that 



hdf^mKb)^f{a)h(aY 


D. THE WEIERSTRASS APPROXIMATION THEOREM 
AND APPROXIMATION BY 
TRIGONOMETRIC POLYNOMIALS 

The Weierstrass theorem is one of the most important approximation 
theorems in classical analysis, and it has served as a point of departure 
for vast generalizations. However, we shall present only the original 
version of the theorem, and out of the vast number of distinct proofs 
which are known we shall present one which admits an almost obvious 
generalization to the multi-dimensional case. (Cf. Exercise 1). 

Theorem i ; Let f be a {reaLvalued or complex-valued) continuous 
function defined on a compact interval [a, b\. Given any positive number c, 
there exists a polynomial p such that the inequality |/?(a') — /(x)| < c holds 
throughout the indicated interval. 

Proof: First we consider the particular case that/(fl) =/(^) = 0. 
We extend the function/ to all of R by defining/(x) to be zero outside the 
interval [a, b\ Clearly, the extended function thus defined is uniformly 
continuous, despite the fact that R is not compact. For any positive 
number t we define the function /< on all of R as follows. 

/,w - - rtof i°-'> 

*/ — 00 



204 APPENDICES 


where c(t) = d^)~^ = Since, from the definition of 

c(t), the equality 

f(x) = cit) f 

J — oo 

holds for all jc, we obtain 

Ux) - f(x) = c(0 J" (/(f) - /(x))e-'«-"''/' dl (D-2) 

Taking account of the uniform continuity of /, we can choose 6(e) such 
that |/(f) — /(^v)| < e/3 whenever |f — .v| < 6. From (D-2) we obtain 

rx^s ^ 2 r 

\ft(x) ~/(x)| < cit) I + c(t) (D-3) 

Jx-d 3 

where M = max^e^^ |/(a')| = max^.,^^^ |/(x)|. From (D-3) we easily 
obtain the further inequality 




l/t(^) ~/wi < ^(oj ^ - e 




i; 


-{^-xrit 


= - 4- 4Mr(0t'^^ f du 
3 Jsf t 

= £ + iM r* 


We now choose t so large that the inequality 


r 


4M 


e '' du < - 
3 




(D-4) 


holds; for this choice of t we then have, for all x, the inequality 

|/(x) -/(x)| < ^ . 


(f>-5) 


Referring to (D-1) and confining x to the interval [a, 6], we obtain 


/(x)=c(o r/(f)(i 

Ja 


(-i)*(f - xr ^ (-i)\f - xf 
k^o t’‘k\ kj/+i t^kl 


dl (D-6) 


Taking account of the fact that when f and x are both confined to the 
interval [a, b] the inequality |f — x| < b — a must hold, we obtain from 



D. THE WEIERSTRASS APPROXIMATION THEOREM 20S 


(Z)-6) the inequality 


m - c{t) [ /(i)j ! ds 

Ja \ k^O 


rt! )' 


< Mem -a) 1 • (»-7) 

jt-A'ii tkl 


Now we choose N so large that the right side of (D~l) (which is independent 
of x) is less than 6/3. Then from {D-5) and (D-7) we obtain 


/(X) - c(t) /(I) 1 < l/(^) -/.(•' 

Ja fc=0 t K ! j 

I Ja U-0 tkl I 


.V)| 


Now we observe that 


c(t) 


f /(^)j i 

Ja 


0 l^k! 


<^ + ~ = «. (D-8) 


d{ 


is a polynomial (of degree not exceeding 2A^), and so the proof is complete 

for the case that /(a) =f(f>)=0. 

In the general case, we may obviously find scalars a and /) such that 
f(x) — (XX — vanishes at a and b. We then determine a polynomial q 
such that |{/(x) - ax - fi) - q{x)\ < e for all x in [a,h]. Then the 
polynomial ax + /? + q(x) differs from /(x) by less than f throughout 
the prescribed interval. 

Now we turn to a similar theorem, in which we deal with approxi- 
mation by trigonometric polynomials. 


Theorem 2: Let f be a {real-valued or complex-valued) emtmutm 
function defined on the closed interval [0, 1], and suppose t at j ^ 

The, fo, in, peeUtm number , there entste « 'f 

(Le., a function expressible in the form tix) .J .hm the 

positive integer N and scalars y_Ar, ♦ • • ’^"3’ 

inequality |/(x) - /(x)| < « holds everywhere in [0, IJ. 

P,oop: For »clr poriliv. forojor . Irl .K r:S“"h“rr 
be defined as follows: K„ix) = c,f(l + sos e )/ ) . 
constant is chosen so that JJ ^ 


t^{x)=jj(y)Knix-y)dy- 



206 APPENDICES 


Upon replacing cos 27r(x — y) in this integral by + ^-27ri(x-y)'^ 

one sees immediately that is a trigonometric polynomial. Following 
closely the proof of the preceding theorem, we write 


fix) - t„ix) =j\fix) - fiy)}K„ix - y) dy. (D-9) 

For convenience we extend the definition of / to all of i? as a continuous 
function by the periodicity condition, fix 1) = fix). (It is at this 
point we use the condition /(O) =/(!).) Similarly, we consider 
extended periodically to all of R. The integration appearing in (Z)-9) may 
then be performed over any interval of length one; in particular, we choose 
the interval [x — | , x -f |]. Since /is uniformly continuous we can choose 
a positive number d (independent of x and less than J) such that 

\fix) -fiy)\ < ell 

whenever |x — < d. Then from (D-9) we obtain (taking account of the 

evenness and non-negativity of and using the symbol M to denote 
max I/I) 

rx+S /•a;4l/2 

|/(x) - /„(x)| < - K„ix - y)dy + 4m\ K„(x - j') dy 

2 Jx—d Jx-^-d 

fi/2 rih 

< - KM du + 4M\ KM du 

2 J-l/2 Js 

< - +4M - ■ max K„(u). (D-iO) 

2 2 d<u<i/2 

Now, niax^^^^^ K^iu) = C.M + cos 27r(^)/2)"^, and we obtain an 
upper bound on as follows: 




+ cos 27r«\" . ^ j 2c„ 

|sm27rua« = 


irin + 1 ) 


Hence, c„ < 7 r(« + l)/2, and so from (£)-I0) we obtain 


(D-U) 


|/(x) - t„(x)| < - + JWtKb + 1) 


'^ 1 + cos 2Trd \” 
^ 2 




(^- 12 ) 


Since (/i + 1)((1 -b cos 27 t<J)/ 2)” approaches zero as n increases, we can 
choose a particular value of n such that \f (x) — r„(x)| < €. 



E. THE STRUCTURE OF OPEN SETS OF REAL NUMBERS 


We conclude with the remark that the kernels K„ which we have 
employed may be replaced in the preceding argument by many others. 
In particular, the Fejer kernels, defined by the equations 


F„(jc) = — ~ [ gjn(2n + i)nx \^ 

+ 1 I sin TTx j ’ 

play an especially important role in the theory of Fourier series. The 
reader who is not acquainted with Fejer's discovery of the remarkable 
properties of these kernels will find a presentation of this topic in most 
books on Fourier series. 


Exercises 

1. Let/be continuous in the closed rectangle < .v, < = 1, 2, 

. . . , /i, of Prove that for any positive number e there 
exists a polynomial p in the variables Xi, Xg, . . . , x„ such that 
the inequality 1/ — /?| < c holds everywhere in the afore- 
mentioned region. 

2. Suppose that f is defined in the interval [a, h\ and possesses 

continuous derivatives up to and including the A:-th order 
throughout this interval. Prove that for any positive number 
€ there exists a polynomial p such that the A: -I- 1 inequalities 
|/^^^(x) — p^^Kx)\ < €,y = 0, 1, 2, . . . , hold throughout 
the given interval. 


£. THE STRUCTURE OF OPEN SETS OF 
REAL NUMBERS 

In this appendix we shall prove the assertion made in §2-1 concerning 
the structure of open subsets of R. Let O be any open subset of R. If we 
set aside the trivial case that 0=9, the set O contains a countable 
infinity r^, . . . of rational numbers. For each index k we can find 
(by the definition of an open set in R) real numbers a and b such that 
G (a, b) c: O. Let Oj, equal the greatest lower bound of all values of a 
and let bj^ equal the least upper bound of all values of h for which the 
preceding conditions (namely (a, b) ^ O) are satisfied. (Note that 
we must allow the possibilities = — oo and ^*==+ 00 . Also, in the 
case that is finite we are exploiting the fact that every non-empty set 
of real numbers which is bounded below [above] possesses a greatest 
lower[least upper] bound.) Let 4 be the open interval (a*, 6*). Evidently 

4 c o, and so U® i4 ~ ^ ~ 

first observe that r, e 4 , so that Uti 4 contains all the rationaUumbers 
contained in O, Let s be any irrational number contained in O. Then, 
since O is open, we can find an open interval {a, b) such that a < s <b 
and (a, b) O, Since the rational numbers form a dense subset of R, 



208 APPENDICES 


we can find an index j such that g {a, b). It follows that s e Ij ^ UiK*i 4- 
Thus, U£.i 4 = O, 

Now, it is evident, from the manner in which the intervals Z^, Zg, Z3, . . . 
are defined, that either Z, = Ij, or Z, n Z^ = 0. Thus, O has been expressed 
as the union of a finite or countably infinite collection of disjoint open 
intervals. 

Finally, suppose that O is expressed as a union of disjoint, open, 
non-empty intervals, say O = (Note that we are not assuming 

that the collection {J^} is finite or countably infinite.) For any index a, 
niust contain a rational member of say (since is a non-empty 
open interval). It then follows, from the manner in which the Z^’s were 
constructed, that g Z,^. If were a proper subset of at least one of 

the two end-points of say y, would lie in Z„,. Since is open, y ^ 

and so y must belong to for some index a different from a; but then, 
since is open, the intervals and would overlap, contrary to hypothe- 
sis. Thus, each must coincide with one of the intervals Z^^, and so we 
have shown that the decomposition of O into disjoint non-empty open 
intervals is unique (aside from order). 


F. INFINITE SERIES AND THE NUMBER SYSTEM [0, -j-00] 


Given any sequence ^3^ • • • of real numbers, by the series 

ji -I- ^2 4- 4- • * • we mean the sequence of partial sums s^, s^, 53, ... , 

where The series is said to be convergent if the sequence of 

partial sums is convergent; when this occurs, the sum of the series is 
defined to be the number lim„_^ s^^. If the series is not convergent, it is 
said to be divergent. 

If the terms of the series -1- ^2 + + * • • are all non-negative, 

then the corresponding sequence of partial sums is monotone non- 
decreasing, and from the fundamental principles of real analysis it follows 
that the series converges iff the set of partial sums is bounded above; for 
example, the series 1/1- -f 1/2^ + 1/3^ -f • • • is convergent, since it is 
easily demonstrated that all the partial sums satisfy the inequality s^ < 2. 
(Proof : For n > \, 


1 




= 2-i<2.) 


n 



F, INFINITE SERIES AND THE NUMBER SYSTEM p. +oo] 209 

It is a remarkable and important fact that if a series consisting of non- 
negative terms is convergent, any rearrangement of this series (we shall 
not give a formal definition of a rearrangement-~the meaning of this 
expression should be clear) is also convergent, and furthermore the 
original senes and the new series have the same sum. Actually this 
result, remarkable enough in itself, can be extended considerably; we 
shall content ourselves here with stating the fact that if the original series 
is decomposed into a number of series (even infinitely many), each of the 
series of the latter collection is convergent, and the sum of their respective 
sums agrees with the sum of the original series. For example, let us break 
up the convergent series -F + <^3 “F * * ‘ into infinitely many series, 
*^1, S 2 , 5*3, . . . , in the following way. 

Syi + ^^2 4* ^4 “F ^/7 -F -F • • ' , 

^ 2 ' <^3 + ^5 4- -F cii2 4- ' * * , 

‘5^3* ^6 “F ^9 4" <^13 4- * * ‘ , 

S^: OiQ 4- ^14 -F • • • , 

‘S’s- ^15 4- * • • , 

Sf^: a^i “F * * * , 

and so forth. Then each of the series 5*,, 5*2, S3 , ... is convergent, and if we 
denote their sums by (Tj, 0*2, . . . respectively, then the series -f 4- 

or3 + • • • is convergent, and its sum equals the sum of the original series, 
4- ^^2 4* ^3 4- ’ * * . 

Next, we define a series of real, but not necessarily non-negative, 
numbers 4- 4“ ^3 4" * ’ * absolutely convergent if the series 

l^il 4- 1^2! 4" 1^3! + * ’ * is convergent. Note that we do not include in the 
definition the hypothesis that the original series is convergent — it is a 
theorem that an absolutely convergent series, as defined, is convergent, and 
that the assertion concerning the rearrangement of a convergent series oi 
non-negative numbers holds true in the present case also, as does the 
assertion concerning the breakup of a series into a collection of series. 

The situation changes radically if the series is convergent but not 
absolutely convergent; when this happens, the series is said to be 
conditionally convergent . For example, the series I — i 4* J i 4* ' 
is not absolutely convergent, since the series 1 4* I 4* i 4- i -F * * * is 
known to be divergent, but it is convergent; this is guaranteed by the 
alternating series test, with which the reader is presumably familiar. 
Furthermore, by the test just cited, it is known that the sum of this senes 
is less than (= 1) and more than s^, (= I); ^ fact, the sum is log 2 
(= 0.693 • * •)• However, we shall need only the fact that the sum of this 



210 AmNDICES 


scries, which we shall denote by cr, is not zero. The series i — J + i + 
' • • then converges to as does the series 04 'i + 0 — J-f 04 *i + 

0 — i . (We do not prove here the elementary fact that when a 
convergent series is multiplied by a fixed factor, the new series is also 
convergent and its sum is equal to the product of the fixed factor and the 
sum of the original series, nor the fact that the convergence of a series and 
the value of its sum are unaffected by inserting any finite number of zeros 
between any pairs of consecutive terms or by suppressing any zeros which 
may appear; also, we shall accept the fact that two convergent series, 
when added termwise, furnish a convergent series whose sum is obtained 
by adding the sums of the two given series.) Thus, the series l+O + J — 

1 + i + O + l — i + i-fO-fiT— 6 + **‘» and hence the series 

1 + + + + must converge to 3 (t/ 2 . 

Now it is readily seen that this series is a rearrangement of the original 
series, obtained by taking the first two positive terms, the first negative 
term, the next two positive terms, the second negative term, and so forth. 
Since cr 5*^ 0 , we see that the sum of the original series has been altered by 
this rearrangement. This example (which illustrates a very remarkable 
theorem which is developed in the accompanying exercises) shows that 
conditional convergence is a much more fragile phenomenon than 
absolute convergence. 

Returning now to series of non-negative terms, we remark that it is 
often convenient to assign a sum to such a series even when it is not 
convergent. (In particular, as is evident in Chapter 2 , this is very desirable 
in developing the theory of Lebesgue measure.) We therefore extend R+, 
the non-negative half of the real number system, by adding to it one new 
member, denoted + 00 (the -f sign is often omitted), having the following 
properties: (/) (-f 00) -f a = + 00, where a is either -f 00 or a member of 
R^; (//) a • (“f 00) = 4- 00 if a = + 00 or if a is any non-zero member of 

but 0 • (+ 00) = 0; (Hi) (-h 00) — a = -f- 00 if a is any member of R^ ; 
(iv) a < -h 00 (or 4- 00 > a) for any member aof R^; (v) (4- 00) — (4- 00) 
is undefined. We denote this enlargement of R^ by the symbol [ 0 , 4 * 00] 
(in contrast to [0, 4- 00), which is the same as 

We then assign to any divergent series whose terms belong to as a 
sum the number 4- 00, and we also assign this value as the sum of any 
series whose terms are members of [0, 4- 00] provided that at least one 
term of the series is -f 00. It then becomes almost self-evident that the 
theorems concerning convergent series of non-negative real numbers 
remain valid in the more general class of series which we have just 
introduced. 

Exercises 

1 . (a) Prove that a conditionally convergent series (of real numbers) 
must contain infinitely many positive terms and infinitely 
many negative terms. 



G. LIMIT SUPERIOR AND LIMIT INPSRIOR 111 


(b) Show that the series consisting of the positive terms of a con- 
ditionally convergent series must be divergent (or, in the 
terminology introduced in the last paragraph, the sum of the 
series must be +oo). (Of course, a similar result must hold 
for the series formed from the negative terms of the original 
series.) 

2. Prove the following remarkable theorem, due to Riemann: If 
the series + * * * is conditionally convergent and 

if X is any real number, it is possible to rearrange the given 
series so as to converge to the sum X. Hint: Taking account of 
(b) of the preceding exercise, select enough positive terms from 
the beginning of the given series so that their sum exceeds x. 
(We assume here for convenience that x > 0; if x < 0, only 
a trivial modification is needed.) Then subtract from this sum 
enough negative terms so that a number less than x is obtained. 
Then return to the remaining positive terms until a sum 
exceeding x is once again obtained, and so forth. (For 
simplicity, assume at first that the given series contains no 
zero terms; these may easily be accounted for in the rearrange- 
ment at the very end of the task.) 


G. LIMIT SUPERIOR AND LIMIT 
INFERIOR 

Letfl, ao a,,... be a given sequence of real numbers. For convenience 

w. .tall « a. pr.L J ,hi. i. bou"« a»o.. “d 

below, but later we shall indicate very briefly how the ^ 

developed here are extended when the given sequence is unbounded 

least ^ne direction g^,^„Q.VVeierstrass theorem, the given sequence 

commas a coLrgent subsequeac; furtom.,., if >;■' 
is convergent to a number /, then every subsequence also ^ 

0„ the other hand, if <>■= “f to eatraet 

at least two distinct numbers, li and k, su .-„„,riTes to / and a 

from the original sequence a f^^^Ve (bounded) 

subsequence which converges to . P^bsequences i, i i, • • - 

sequence J, h . resoectiveW ; it is not difficult 

aod I. » . r. . . . .bick TTo Jtte tot“K,«nc.. of the given 
to see that no other numbers can be i 0 1 0 1 0 . . . serves 



212 APPENDICES 


for the set consisting exclusively of the numbers 0 and 1 , being finite, has 
no limit-points. (On the other hand, as pointed out in Chapter 1, any 
limit-point of the set of numbers appearing in any sequence is the limit 
of some suitably chosen subsequence.) 

Thus, given any bounded sequence ai, flg* ^ 3 » • • • of real numbers, the 
set L of all possible limits of convergent subsequences is non-empty; 
furthermore (cf. Exercise 1) L is closed. Since L ^ [a, b], where [a, b] 
is any closed interval containing all members of the given sequence, it 
follows that L must be compact, and so it contains (not merely has) a 
least upper bound and a greatest lower bound. These numbers are known, 
respectively, as the upper limit, or limit superior, and the lower limit, 
or limit inferior, of the given sequence; the notations lim sup a„, or 

n ~* 00 

lim„_Q^flf,j, and lim infn„, or lim,^^^^ a^, are employed. (The notation 

w— ►CO 

/I “► 00 is really unnecessary and is often omitted.) 

We now state the following theorem, which gives two alternative 
characterizations of the upper and lower limits; the proof is left to the 
reader as Exercise 6 . 

Theorem I : (a) Let Oi, , be any bounded sequence of real 

numbers and let lim a^ be denoted by a. Then a is the unique number 
possessing the following property: For every positive number e, the in- 
equality < a -f e satisfied for all indices n exceeding some index N 
{which may depend on e), while the inequality a^ > ol — e is satisfied for 
infinitely many indices n, (A dual characterization holds for l^n„, of 
course) 

(b) As in {a), let a-^, a^, a^, . . . be a bounded sequence of real numbers, 
and let s,, = sup {a^, Then the sequence s^, s^, s^, , . . is 

monotone non-increasing, bounded below by {a i, a 2, a^, .. .}, and hence 
convergent to a limit. This limit coincides with lim a^^. (/Ij in (a), a dual 
characterization holds for lim a^) 

If the sequence ^ 1 ,^ 2 , < 23 , ... is unbounded in either one or both 
directions, it may be impossible to apply the preceding definitions if we 
restrict ourselves to operating within the real number system R. Without 
going into details, we mention that the brief explanation provided in §2-4 
of the extended real number system makes it evident how to proceed 
in such a case. We illustrate with a few simple examples: 

(a) For the sequence 0, 1, 0, 2, 0, 3, . . . the lower limit is 0, the 
upper limit is -I- 00 . 

(b) For the sequence 1 , —2,3, —4,5, —6, . . . the lower limit is — 00 , 
the upper limit is -h 00 . 

(c) For the sequence 1, 2, 3, 4, 5, 6, . . . the lower limit and the 
upper limit are both equal to -f 00 . 

The reader may find it helpful, in mastering the ideas presented, to 
study the following important theorem. 



G. LIMIT SUPERIOR AND LIMIT INPERIOR IIJ 


Theorem 2 (Cauchy- Hadamard): Let t/s, . . he any 

sequence of complex numbers. Then the series a^z 4- agZ* -f -{-•• • 
converges for any complex number z satisfying the inequality |zl < 
(lim and diverges if\z\ > (Urn (Of course, if 


lim == 0 


this is to be interpreted as meaning that the series converges for all values of 
z, while // lim = + oo the conclusion is that the series converges 

only when z = 0.) 

Proof: For convenience we assume that lim is positive and 
finite, so that we may express it in the form \jR, where 0 < /? < -f oo. 
(R is known as the radius of convergence of the given series; the reason 
for this terminology is obvious from the theorem.) The validity of the 
conclusion in the extreme cases /? = 0, 7^ = + oo will be apparent from 
the proof to be presented. 

If \z\ < R, then for any positive number f , the inequality < 

(^Ijp 4 - e) \z\ holds for all sufficiently large n. Since |r| < R, ^ can be 
chosen so small that (IjR + e) | 2 |< 1. Denoting the left side of this 
inequality by a, we conclude that for all sufliciently large n the inequality 
\a„z’'\ < a" holds. Since the series a + a''‘ + «“ + •• • converges, the 
series must converge. (Indeed, the series converges absolutely, and if r 
is restricted by the further condition |z| < R’, where R’ is any number 
smaller than R, the convergence is uniform.) 

Conversely, if \z\ > R< it is readily seen by examining the preceding 
paragraph that the inequality |a„z"| > 1 must hold for infinitely many 
values of n (but not necessarily for all sufficiently large values of n). 
Thus, the terms of the series do not approach zero with increasing n, 

hence the series cannot possibly converge. . _ , 

As can be shown by simple examples (cf. Exercise 7), the series tnay 
converge for some values, no values, or all values of z satisfying the 
condition \z\ == R. 


Exercises ,■ i a 

1 Prove that the set L defined previously is closed. 

2, ..rms all 1» » 

(0, 1) and whose set L consists of all points of the closed 

interval [0, 1]. 

. u h h be a rearrangement of the sequence 

3. A,s».a«d wm, 

wo Waea"* " “• ” " 

lim bn and lim = !l01 



214 APPENDICES 


4. Show that Hm (— = — Hm Un and similarly that lim (—flf J = 

— lim 

5. Show that lim {a^ + Z>„) < lim + lim and similarly that 

lim (a„ + ^n) > lim + lim b^. 

6. Prove Theorem 1 . 

7. Consider the series z -f 2:^ + + * * * , z + z^l2 4* z73 + • * * , 

and z 4- z^jl^ 4- z^j^^ + • * • . Show that for each of these 
series the radius of convergence equals one, and that the first 
series converges for no value of z whose modulus is one, while 
the third series converges for all such values and the second 
converges for at least one such value and diverges for at 
least one such value. 


H. THE FOURIER TRANSFORM IN L^{R) 

In this appendix we indicate briefly how the Fourier transform may 
be interpreted as a unitary operator on L^(R). 

Let fi and be the characteristic functions of the (finite) intervals 
{ai, bi) and (a^, b^ respectively. It is not assumed that these intervals are 
disjoint. The Fourier transforms /i,/2 of these functions are defined, for 
all real values of r, as follows: 




J— 00 Jajt 

By an elementary computation one obtains 


^~2;riakt^ ^~2vibkt 
IttH 




■ (H-l) 


(H-2) 


where 


g{t) = cos Inib^ — bi)t 4- cos — ai)t 

— cos 27r(^2 “■ ~ cos 271(^2 — ^ 1 )/ 

and 

h(t) = sin 277(^2 — ^1)^ + sin 277(^2 — aj)t 

— sin 277(^2 — ai)t — sin 277(^2 — ^1)^* 

Since h{t)lt^ is an odd function, one concludes from (H-l) that 



H. THE FOURIER TRANSFORM IN t*(R) SIS 


and four uses of the identity cos 2ii = 1 — 2 sin* v lead to the equality 

f MofM = A I 1 - ai)t + sin* 1 ^( 0 ^ - fci)» 

J-00 J-<x> t 

— sin’* 7r(6a ~ “■ ^i)0 (H-3) 

For any real number c, (sin^ ctjt^) dt == n |r| (cf. Exercises 3 and 10), 
and so (H-3) furnishes the equality 

[ /iCO/iCOd* = + 1"2-M - 1^2- (W-4) 

J— 00 

By considering all possible relationships among the quantities ai, K, 
(subject, of course, to the restrictions and < h^) one readily 

confirms that the right side of {H~4) equals the length of the intersection 
of the two given intervals. Therefore, 

rut)W) 

J-OO 

We now proceed to extend the class of functions for which (H-5) 
holds true. If gi and ^re any step-functions vanishing outside some 
sufficiently large interval, they may be expressed in the form 


gi(x) = 2 ^ 

where the functions • • • .A’, and ^milarly the 

fi 2 )^ are characteristic functions of disjoint intervals. Defining 

the Fourier transforms gi and gj of g, and gj as 


r 


j" 


g,{x)e-^’*'^dx, 


gi(x)e'*"‘*dx and 

K 

respectively, we obtain (taking account of {H-5}) 

J_0D ^*1 

= 2 =£g.wiW‘i^- 

iTi ife=i J-® (H. 


(H-7) 


, . ^ fc iv anv two functions belonging to L*{R) and 

Now let hi and hi be any int-rval fa b] By the Schwarz 

vanishing outside some sufficiently large interval fa, m y 



216 APrENDiCES 


inequality we obtain \hi,{x)\ dxY == dxY < 1® dx) x 

{JJ i/x} ^ (b a) WhfcWl < 00 , and so we have shown that the 

functions hi and //g belong to L^(R). Since = 1 for all real values 

of t and X, the integrals hj^x) e”^*"*'* dx exist for all real r; in fact, the 
value of the integral depends continuously on t and approaches zero as 
1^1 —► 00. (Cf. Exercise 7.) These functions are, of course, called the 
Fourier transforms^ hi and ^ 2 > of and h^. By a suitable approximation 
argument, which we do not present here, it can be shown that {H-1) 
generalizes to the present case: 

f hi(t)h 2 (t) dt hi(x)hz(x) dx. (H-8) 

J — 00 J— 00 

(The integral on the right is meaningful; this is guaranteed by the 
Schwarz inequality, even without the restriction that the functions hi and 
/?2 vanish outside [a, Z?].) However, if hi and //g fail to vanish outside 
some finite interval the formal definition hj,{t) = hjfx) dx may 
be meaningless. In this case we proceed as follows. Let h be any member 
of L\R) and for each positive integer n let coincide with h in [— «, n] 
and vanish outside this interval. Then we may construct the sequence of 
Fourier transforms h^'^^ h^^\ h^^\ .... Replacing both hi and h^ in (//-8) 
by — h^^\ we obtain 

whin) _ ^(m)||^ ^ 

Since the sequence h^^\h^^\h^^\ ... is Cauchy, {H-9) guarantees that 
the corresponding sequence h^^\ h^^\ h^^\ . . . is also Cauchy. By the 
Riesz-Fischer theorem it then follows that there exists a function h in 
L\R) such that lim^^^o = 0, and we now define h to be the 

Fourier transform of h. (The function h is, of course, uniquely determined 
up to a null-set; if the integral /7(x)e~^’'*^ dx does exist for all /, the 
corollary to the Riesz-Fischer theorem guarantees the equivalence of the 
two definitions of h.) Replacing hi and in (//-8) by the corresponding 
functions and h^f'^ and letting n increase without bound, we find that 
(//~8) continues to hold for any pair of functions in L^{R) and their 
Fourier transforms. 

Thus, the Fourier transform is a linear transformation mapping 
L}{R) into itself and preserving inner products. (Observe that (//-8) can 
be written in the form (^i, = {hi, h^. Setting = /jg = h, we obtain 

\\h\\i == ) The question now arises as to whether this transformation 

maps L\R) onto itself. The proof that this is indeed so will be sketched 
briefly; thus, the Fourier transform will be shown to be a unitary operator 
on L^(R). 

Let / be the characteristic function of the interval (a, b). Then 
f{t) =s temporarily replacing (for convenience) the 



H. THE FOURIER TRANSFORM IN L‘(R) 117 

symbol/ by g, we may express |(/) as follows: 


rn^~2iriax^ 

git) = lim — (H-IO) 

n-^aoj-n IttIX 

(Strictly speaking, the limit is to be taken in the sense of the norm on 
L^iR), but it will be seen that the limit exists in the ordinary sense; 
referring again to the corollary to the Riesz-Fischer theorem, one sees 
that the two limits agree almost everywhere.) A simple computation shows 
that (//-lO) may be rewritten in the form 

git) = lim 


Since the imaginary part of the integrand is odd, it may be discarded, and 
so (H-l 1) may be rewritten as follows: 


sin 77(6 — a)x cos 7r(2f -f- a -h b)x 


f*n , 

= lim ^ ^ dx 

n-*(X^ J—n 7TX 

^ ± lim ( dx - ^4 (H-.12) 

2.7r n-* oL) \ J—n X J—n X j 


Taking account of Exercises 3 and 10 we obtain 

( 1 if -h <t < -a, 

J if / = — fl or / = —b, 

0 otherwise. 

(//- 1 3 ) 


Thus, the equality git) ==fi-t) holds everywhere except at t = -a and 
at / = —b\ if we had assigned to/(a) and fib) the value J instead of zero, 
the equality git) =/(-0 would have held without exception. In any case, 
the result just obtained may be written in the following striking form: 

fit) =zfi—t) almost everywhere. ibl-\4) 


Replacing / by — /, we obtain 


fi~t) =/(/) almost everywhere. 


(//-IS) 


It is evident that (//-15) will continue to hold if/ is any step-function 
vanishing outside some finite interval. By an approximation argumen 
similar to that indicated earlier in this appendix, it can be shown that 



210 AFKNOiCeS 


every function in L\R) satisfies (i/-15). That is to say, given any function 
/ belonging to form its Fourier transform f and then form the 

function h defined by the equation h(t) =/(—r). Then / = ^. Thus, 
every member of L\R) is the Fourier transform of some member of 
L\R)^ and so the Fourier transform is a unitary operator. For emphasis 
we repeat that, in general, the Fourier transform / must be interpreted as 
the limit in the norm of L\R) of the integrals rather 

than as the integral However, with the understanding 

that the latter integral is to be interpreted in the sense explained previously, 
we observe that {H-\5) can be written in the following form, known as 
the Fourier inversion formula: 

If g(0 = then (//-16) 

J— 00 J— 00 

Since, as shown previously (cf. (/f-lS)), two applications of the 
Fourier transform convert / (x) into / (—x), it follows that four applications 
carry the function / into itself. Thus, the Fourier transform is a fourth 
root of the identity operator. 

We conclude with the remark that the Fourier transform is a very 
powerful tool in many branches of analysis. Very often a difficult problem 
involving an unknown function / may be converted into a much simpler 
problem involving/; by solving the simpler problem and then employing 
the Fourier inversion formula, one solves the original problem. 

Exercises 

1. Show that sin^ x/x* 6 L^(R) and that sin x/x ^ L^(R). 

2. Showthatlim^^ooJ^ (sin x/x) i/x exists and equals/® {sin^ xlx^)dx. 

3. Let J® (sin x/x) dx (understood to mean the limit appearing in 

the preceding exercise) be denoted by C. (It will be shown in a 
later exercise that C = 7r/2.) Show that, for any real constant 
a, J® (sin ax/x) cafx = C sgn a, where sgn a, the signum of a, 
is 4-1, —1, or 0 according as a > 0, a < 0, a = 0, respec- 
tively. Similarly, show that J® (sin* ax/x*) dx = C |a|. 

4. Let / be any step-function vanishing outside a finite interval. 

Show that f{t) varies continuously with t and approaches zero 
as 1^1 00 . (It is understood that t assumes only real values.) 

5. In the preceding exercise replace “step-function” by “continuous 

function.” Show that the conclusion still holds. 

6. Now replace “continuous function” by “summable function,” 

and prove that the conclusion still holds. (Retain the re- 
striction that/ vanishes outside a finite interval.) 



H. THE FOURIER TRANSFORH IN lit 


7. Now show that the conclusion holds for any function in L^(jR), 

even if it does not vanish outside some finite interval. (This 
result is known as the Riemann-Lebesgue Lemma; the three 
preceding exercises should be looked upon as stepping-stones 
to the proof of this result, which plays a major role in Fourier 
analysis.) 

8. Prove that the function /(.x) = l/x — 1/(2 sin Jx) is continuous 

in [0, tt] if/(0) = 0. 

9. Prove that, for any positive integer n. 


r sin (n 4- h)x 
fo 2 sin ix 


dx = Itt. 


10. Use Exercises 5, 8, and 9 to show that (sin xjx) dx = njl 
(and hence that (sin xjx) dx = rr). 




SOME SUGGESTIONS FOR FURTHER 

READING 


N. I. Akhiezer and I. M. Glazman 

Theory of Linear Operators in Hilbert Space, Ungar, 1961 

S. Banach 

Th6orie des Operations Lineaires, Chelsea, 1955 (Reprint) 

S. Bergman 

The Kernel Function and Conformal Mapping, American Mathematical 
Society, 1950 

R. C. Buck, Editor 

Studies in Modern Analysis, Mathematical Association of America, 1962 

N. Dunford and J. T. Schwartz 

Linear Operators, Interscience, 1958 (Vol. J) and 1963 (Vol. II) 

B. Epstein 

Orthogonal Families of Analytic Functions, Macmillan, 1965 
P. Halmos 

Introduction to Hilbert Space and the Theory of Spectral Multiplicity, 
Chelsea, 1951 

G. Hellwig 

Differential Operators of Mathematical Physics, Addison-Wesley, 1967 
E. Hille and R. S. Phillips 

Functional Analysis and Semi-groups, American Mathematical Society, 
1957 


J. L. Lions 

Equations Differentielles et Problemes aux Limites, Springer, 1961 
L. H. Loomis 

An Introduction to Abstract Harmonic Analysis, Van Nostrand, 1953 


J. VON Neumann . 

Mathematische Grundlagen der Quantenmechanik, Springer, 1932 


1. Stakgold 

Boundary Value Problems 


of Mathematical Physics, Macmillan, 


1967 


*Li2w Transformations in Hilbert Space and their Applications to 
Analysis, American Mathematical Society, 1932 



222 SOME SUGGESTIONS FOR FURTHER READING 


B. Sz.-Nagy 

Spektfaldarstcllung linearer Transformationen des Hilbertschen Raumes, 
Springer, 1942 

A. Taylor 

Introduction to Functional Analysis, Wiley, 1958 
K. Yosida 

Functional Analysis, Academic Press, 1965 
K. Yosida 

Lectures on Differential and Integral Equations, Interscience, 1960 



CITED REFERENCES 


S. Hartman and J. Mikusinski 

The Theory of Lebesgue Measure and Integration, Pergamon Press, 1961 

F. Riesz and B. Sz.-Nagy 

Functional Analysis, Ungar, 1956 

H. L. Royden 

Real Analysis (2nd Ed.), Macmillan, 1968 
W. Rudin 

Real and Complex Analysis, McGraw-Hill, 1966 

G. F. Simmons 

Introduction to Topology and Modern Analysis, McGraw-Hill, 1963 
E C Titchmarsh 

The Theory of Functions (2nd Ed.), Oxford University Press, 1939 


223 




INDEX 


Absolute continuity, 1 17 
Absolute convergence, 209 
Absolute value (of Hermitian opera- 
tor), 143 

Additivity, countable, 43 
Adjoint operator, 129-131 
Almost everywhere, definition of, 42, 
57 

Analysis, hard vs. soft, 177 
Approximation, successive, 1 1 
Arithmetic mean, 63 
Ascoli-Arzela theorem, 26 
Associativity of mappings, 132 


Baire category theorem, 17 
Ball, 6. See also Convex sets. 

Banach space, 84 

Banach-Steinhaus theorem, 105, 134 
Basis, 79, 97-101 
Bessel’s inequality, 91-92 
Bilinear expansion (of Hermitian ker- 
nel), 190-192 
Binary rational, 63 
Bolzano-Weierstrass space, 25 
Bolzano-Weierstrass theorem, 5, 10 
Borel field, 60-61 

Bounded linear functionals, 103, 109- 
111 


Bounded set, 5, 25, 26 
Bounded variation, 202 
B-W. See Bolzuno-Weierstra.Ks space. 


Cantor intersection theorem, 8-9 
Carleson, 109 
Cartesian product, 2 
Category of metric space, 17 
Cauchy sequence, 8 
Cauchy-Hadamard theorem, 213 
Cauchy-Picard theorem, 14 
Cauchy-Schwarz inequality, 4. See also 
Schwarz inequality. 

Chain, 198 

Characteristic equation, 157, 161-162 
Characteristic function, 157 
Characteristic polynomial, 157 
Characteristic values. See Eigenvalues. 
Characteristic vectors, 148, 157-162 
Closed ball, 6 
Closed set, 5 
Closure, 6 

Compact operator, 1 77- 1 85 
Compactness, 24-28 
local, 25, 83-84 
sequential, 25 
Comparable elements, 197 


225 



226 INDEX 


Complement, 1 
orthogonal, 89 

Complete continuity, 177-185 
Completely continuous operators, 177- 
185 

spectral representation of, 181-185 
Completion, sequential, 9 
Complex number system, 2 
Component (of a set), 30 
Conditional convergence, 209 
Conjugate isomorphism, 120 
Conjugate number, 64 
Continuity, 21-24 
absolute, 117 
complete, 177-185 
of linear functional, 104 
sequential definition of, 21 
uniform, 22. See also Equiconti- 
nuity. 

Contracting mapping, 1 1 
Contraction theorem, 10-13 

application to differential equations, 
13-16 

Convergence, 6-10 
absolute, 209 
conditional, 209 
radius of, 213 
strong, 136 
uniform, 23, 135 
of harmonic functions, 112 
weak, 115-116, 136-137 
Convex sets, 85, 93, 97 
Countable additivity, 43 


Degenerate eigenvalue, 158-159 
Degenerate operator, 180, 190 
Dense subset, 6 
nowhere, 16 

Determinant (of a transformation), 
156. See also Characteristic poly- 
nomiah 

Diagonalization, 160 
Diameter, 9 
Difference (of sets), 1 
Differentiable function, 18 
Differential equations, 13-16 
Dimension, 79 
Dini^s theorem, 28 
Direct sum, 146 
Domain, 13, 111 
Dual spaces, 104, 120 
of L" and V\ 116-120 


Egoroffs theorem, 48 
Eigenfunction, 190 


Eigenvalues, 148, 157-171 
approximate, 184-185 
degenerate, 158-159 
estimation of, 166-171, 193-195 
non-degenerate, 158 
of Hermitian operators, 149, 163- 
171 

simple, 158 

Eigenvectors, 148, 157-162 
Empty set, 1 
Equicontinuity, 26 
Equivalence class, 9 
Equivalence relation, 9 
Equivalent norms, 83 
Equivalent sequences, 9 
Equivalent sets, 42 
Essential supremum, 69-70 


F^, 44 

Fatou’s lemma, 53-54 
Fejer kernels, 207 
Finite-dimensional space, 79, 84 
First category, 17 
Fixed point, 12 
Fourier coefficients, 100 
Fourier inversion formula, 218 
Fourier series, 105-109. See also 
T rigonometric polynomials. 

Fourier transform, 214-219 
Fredholm alternative, 185-188 
Fredholm theory of integral equations, 
189-193 

Fubini theorem, 59 
Function, simple, 48-49 
summable, 54 
Functional(s), 102 

linear. See Linear functionals. 
Functional analysis, iii 


Gs, 44 

Geometric mean, 63 
Gram-Schmidt process, 90 


Hahn-Banach theorem, 120-125 
Hamilton-Cayley theorem, 161-162 
Harmonic functions, 111-115 
uniform convergence of, 112 
Harmonic series, 108 
Heine-Borel theorem, 24 
Hermitian kernels, 190-195 
bilinear expansion of, 190-192 
Hermitian matrices, 164-165 
Hermitian operators, 137-143, 162- 
174 



INDiK m 


Hermitian operators {Continued) 
absolute value of, 143 
eigenvalues of, 149, 163-171 
spectral representation of, 171-174 
184 

Hilbert spaces, 92-101 
Holder inequality, 64, 71-72 


Ideal, 199-200 
Idempotent, 144 

Identity, resolution of, 172, 175, 183, 
185 

Identity transformation, 127 
Iff, 5 

Image, 22 

Imaginary part of operator, 152 
Index-set, 1 

Inequality, Bessel’s, 91-92 
Cauchy-Schwarz, 4 
Holder, 64, 71-72 
Minkowski, 67, 71-72 
Schwarz, 4, 88 

generalized, 143, 170 
triangle, 2, 81 
Inner measure, 37 
Inner point, 5 
Inner product, 87 

Inner-product spaces, 87-92, 162-168. 

See also Hilbert spaces. 

Integrable function, 54 
Integral, iterated, 59 
Riemann, 29, 54, 60 
Stieltjes, 201-203 

Integral equations, Fredholm theory 
of, 189-193 

Integration. See also Lehesgue measure 
and integration. 

of non-negative functions, 49-54 
Interior, 6 
Interpolation, 115 
Intersection (of sets), 1 
Inverse, 131-132 
Inverse image, 22 
Inverse mapping, 131-132 
Inverse operator, 134, 146-148 
Isolated point, 8 
Isomorphic linear spaces, 79, 99 
Isomorphism, 154 
conjugate, 120 
Iterated integral, 59 
Iteration, 11 


Kellogg’s method, 168-171, 193-195 
Kernel, 178, 190 
Fejer, 207 


Kernel {Continued) 

Hermitian, 190-195 
bilinear expansion of, 190-192 
reproducing, 114-115 
Kolmogoroff, 109 
Kronecker della, 155 


72 

L®, 69-70 

L'’- and /'-spaces, 62-75, 100 
Laplace’s equation, 1 1 1 
Lattice. 199 

Least upper bound, 198 
Lebesgue dominated convergence the- 
orem. 55-56 

Lebesgue measure and integration, 29- 
61 

Lebesgue number, 27 
Left inverse, 132 
Left-hand derivative, 18 
Legendre polynomials, 101 
Limit, 7 

inferior and superior, 46, 211- 
214 

Limit-point, 5 
Linear combination, 78 
Linear functionals. 102-125 
bounded, 103, 109-111 
continuity of, 104 
norm of, 103 

Linear independence, 78, 90, 115 
Linear manifold, 78, 84 
Linear spaces, 78-80 
isomorphic, 79, 99 
normed, 80-86 

Linear transformations, 126-128. Sec 
also Operators. 
norm of, 128 
product of, 127 
Linearly ordered, 198 
Liouville’s theorem, 201 
Lipschitz condition, 14 
Littlewood’s three principles, 49 
Local compactness, 25, 83-84 
Lower bound, 198 
Lower limit, 46, 21 1-214 
Lusin’s theorem, 49 


Manifold, linear, 78, 84 
Mapping(s), associativity of, 132 
contracting, II 
inverse, 131-132 
onto, 127, 131 
Matrix. See also Operators. 
representative, 154 



228 INDEX 


Maximal element, 198 
Maximal ideal, 199-200 
Maximum, 3 
Mean, 63 

Measurable functions, 44-49 
Measurable sets, 36 
Measure, complete, 6 1 

of measurable set, 36. See also Le- 
besgue measure and integration. 
of open set, 3 1 
outer, 36-37 
Metric space (s), 2-28 
complete, 8, 67-68, 83 
topology of, 5-7 
Mid-point convex space, 97 
Minimax theorem, 166 
Minkowski inequality, 65, 71-72 
Monotone convergence theorem, 52. 
See also DinVs theorem. 


Negative part, 54 
Neighborhood, 22 
Nested sets, 8 
Nilpotent, 128, 138, 162 
Non-degenerate eigenvalue, 158 
Non-negative functions, integration of, 
49-54 

Norm(s), 80 
equivalent, 83 
of linear functional, 103 
of linear transformation, 128 
Normal operators, 150 
Normed linear spaces, 80-86 
Norm-preserving, 150-151 
Nowhere dense subset, 16 
Nowhere-differentiable function, 18- 
21 

n-tuples, 3 
Null-set, 42, 43 
Null-space, 110 


Onto mapping, 127, 131 
Open ball, 6 

Open mapping theorem, 134 
Open sets, 5 
measure of, 3 1 
structure of, 6, 30, 207-208 
Operator(s), 128 
adjoint, 129-131 
compact, 177-185 
completely continuous, 177-185 
spectral representation of, 181- 
185 

degenerate, 180, 190 

Hermitian. See Hermitian,^^ermor.- 


Operator(s) (Continued) 
imaginary part of, 152 
inverse, 134, 146-148 
normal, 150 

spectral representation of, 176 
positive, 140-143 
projection, 143-146 
real part of, 152 

representation of, by matrices, 153- 

157 

shift, 184 

spectrum of, 146-152, 200-201 
totally bounded, 177-185 
unitary, 151 

spectral representation of, 176 
Ordered n-tuples, 3 
Orthogonal complement, 89 
Orthogonal vectors, 89 
Orthogonalization, 90 
Orthonormal basis, 97-101 
Orthonormal collection of vectors, 90 
Outer measure, 36-37 


Parallelogram law, 92 
Parseval relation, 98 
Partially ordered set, 197-200 
Partition, 9 

Perpendicular. See Orthogonal. 

Plane sets, 57-60 
Poset, 197-200 

Positive definite expression, 92 
Positive operator, 140-143 
Positive part, 54 

Product (of linear transformations), 
127 

Projection, 143-146 
Projection theorem, 94-95 
Pseudo-metric space, 67 


Quadratically integrable, 62 


Radius of convergence, 213 
Range, 15 

Rational number system, 2 
Rayleigh quotients, 194 
Rayleigh-Ritz procedure, 167-168, 195 
Real number system, 2 
extended, 30, 47, 208-211 
Real part of operator, 152 
Rearrangement of infinite series, 30 
Reflexivity, 197 
Representative matrix, 154 
Reproducing kernel, 114-115 



INDEX 229 


Resolution of the identity, 172, 175, 
183, 185 

Resolvent set, 147 
Restriction, 47, 121 
Riemann integral, 29, 54, 60 
Riemann-Lebesgue lemma, 219 
Riesz representation theorem, 110-111 
Riesz-Fischer theorem, 67-68 
Right inverse, 131-132 
Right-hand derivative, 18 


Scalar, 76 

Schwarz inequality, 4, 88 
generalized, 143, 170 
Second category, 17 
Self-adjoint, 138 
Separability, 73-75, 97, 1 13-114 
Separation, 125 
Sequence (s), 7 
Cauchy, 8 
convergent, 7 
equivalent, 9 
of operators, 135 
Sequential compactness, 25 
Sequential completion, 9 
Sequential definition of continuity, 21 
Series, 137, 209 
Fourier, 105-109 
harmonic, 108 
infinite, 30 
Set-inclusion, 1 
Set-theoretic notation, 1 
Shift operator, 184 
Sigma-algebra, 60-61 
Space, Banach, 84 

Bolzano-Weierstrass, 25 
dual. See Dual spaces. 
finite-dimensional, 79-84 
Hilbert, 92-101 
inner-product, 87-92, 162-168 
and /^ 62-74, 100 
linear. See Linear space. 
metric. See Metric space. 
pseudo-metric, 67 
vector, 76 
Spanned, 78 

Spectral representation, of completely 
continuous operators, 181-185 
of Hermitian operators, 171-174, 
184 

of normal operators, 176 
of unitary operators, 176 
Spectral theorem, 173, 175, 184 
Spectrum of operator, 146-152, 200- 
201 

Square integrable, 62 


Square root (of positive operator), 
140-143 

Step-function, 48 
Stieltjes integral, 201-203 
Subsequence, 8 
Subspace, 84 

Successive approximation, 1 1 
Supremiim, 3 
essential. 69-70 
Symmetry, 197 


Topology (of metric spaces), 5-7 
Totally bounded operator. See Com- 
pletely continuous operator. 

Totally bounded set, 25 
Totally ordered, 198 
Trace, 160 
Transitivity, 197 
Translation-invariant, 43, 67. 84 
Triangle inequality, 2, 81. Sec also 
M inkow'ski inequality. 
Trigonometric polynomials, 100, 205- 
207 


Uniform boundedness, 105, 134 
Uniform continuity, 22 
Uniform convergence. 23, 135 
of harmonic functions. 112 
Union (of sets), 1 

Unitary operators, 151. Sec also 
Fourier transform. 

Universal set, 1 
Unmeasurable set, 43-44 
Upper bound, 198 
Upper limit, 46, 211-214 
Urysohn’s lemma, 24 


van der Wacrden, 21 
Vector space, 76 
Volterra equations, 191 


Weierstrass approximation theorem, 
75, 203-205 

Weierstrass nowhcre-differentiable 
function, 21. See also Bolzano- 
Wcierstra.ss theorem. 


Zero transformation, 127 
Zero vector, 76-77 
Zorn’s lemma, 198 



